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ations  Research,  and  it  has  been  published  as  Department  of  Operations 
Research  Report  No.  44,  March  22,  1974. 

This  report  gives  examples  of  the  use  of  difference  equations  in 
the  investigation  of  optimal  control  policies  in  queueing  systems.  It 
is  the  second  in  a  series  of  four  reports  which  will  cover  Mr.  Reed  s 
research  at  Stanford  in  queueing  optimization.  Since  queueing  systems 
and  their  optimization  have  applications  to  a  number  of  DOD  problems 
the  report  is  reproduced  here  as  an  NWC  TP  for  distribution  to  various 
defense  installations. 

Mr.  Reed  was  supported  by  Navy  Director  of  Laboratory  Programs, 

Task  Assignment  R00001-R01405 .  Review  and  report  preparation  at  Stan 
ford  University  were  supported  in  part. by  National  Science  Foundation 
Grant  GK-35491  and  Army  and  Navy  Contract  N00014-67-A-0112-0052 
(NR-042-002) . 


Released  by 

DELBERT  E.  ZILMER,  Head 
Mathematics  Division 
3  June  1974 


Under  authority  of 
HUGH  W.  HUNTER,  Head 
Research  Department 


NWC  Technical  Publication  5594 


Published  by. 
Collation  .  . 

First  printing 


.Research  Department 
.  .  Cover,  52  leaves 
150  unnumbered  copies 


UNCLASSIFIED 


SECURITY  CLASSIFICATION  OF  THIS  PAGE  (Wion  Data  Entered) 


_ REPORT  DOCUMENTATION  PAGE  befoIe^KiKg 


1.  REPORT  NUMBER  b  GOVT  ACCESSION  NO.I  3.  RECIPIENT'S  CATALOG  NUMBER 


NWC  TP  5594 


4.  TITLE  ( and  Subtitle)  5.  TYPE  OF  REPORT  &  PERIOD  COVERED 

DIFFERENCE  EQUATIONS  AND  THE  OPTIMAL  CONTROL  OF 

SINGLE  SERVER  QUEUEING  SYSTEMS  A  research  report 

6.  PERFORMING  ORG.  REPORT  NUMBER 


7.  author^; 


Frank  C.  Reed 


8.  CONTRACT  OR  GRANT  NUMBERfsJ 


9.  PERFORMING  ORGANIZATION  NAME  AND  ADDRESS 

Naval  Weapons  Center 
China  Lake,  CA  93555 


10.  PROGRAM  ELEMENT,  PROJECT,  TASK 
AREA  &  WORK  UNIT  NUMBERS 

1152N,  R00001,  R01405 
160070-6 


1  1.  CONTROLLING  OFFICE  NAME  AND  ADDRESS 

Naval  Weapons  Center 
China  Lake,  CA  93555 


12.  REPORT  DATE 

June  1974 


13.  NUMBER  OF  PAGES 


7 


14.  MONITORING  AGENCY  NAME  &  ADDRESS^//  different  from  Controlling  Office)  I  15.  SECURITY  CLASS,  (of  thta  report) 


UNCLASSIFIED 


15a.  DECLASSIFICATION/ DOWN  GRADING 
SCHEDULE 


16.  DISTRIBUTION  STATEMENT  (of  thla  Report) 


Approved  for  public  release;  distribution  unlimited. 


17,  DISTRIBUTION  STATEMENT  (of  the  abstract  entered  in  Block  20,  if  different  from  Report) 


19.  KEY  WORDS  (Continue  on  reverse  aide  it  neceaaary  and  identify  by  block  number) 

Single-Server  Queueing  Systems,  Optimal  Control,  Expected  Discounted  Costs 
Expected  Average  Cost  per  Unit  Time,  Intermittent  Service, 

Service  Rates  Selection,  Bulk  Service,  Difference  Equations 


20.  ABSTRACT  (Continue  on  reverse  aide  if  neceaaary  and  identify  by  block  number) 

See  reverse  side 


DD  ,™rm73  1473  EDITION  OF  1  NOV  65  IS  OBSOLETE 

S/n  0102-014-  6601  UNCLASSIFIED 

SECURITY  CLASSIFICATION  OF  THIS  PAGE  (When  Data  Bntarad) 


itCURITY  CLASSIFICATION  OF  THIS  PAOEflWian  Data  Entered) 


(U)  Difference  Equations  and  the  Optimal  Con 
trol  of  Single  Server  Queueing  Systems  by  Frank  C. 
Reed.  China  Lake,  Calif.,  Naval  Weapons  Center, 

June  1974.  102  pp.  (NWC  TP  5594,  publication 

UNCLASSIFIED). 

(U)  This  report  demonstrates  the  use  of  differ¬ 
ence  equations  in  solving  optimal  control  problems 
in  single  server  queueing  systems.  One  obtains  the 
discounted  or  relative  cost  function  associated  with 
a  specific  stationary  policy  by  solving  an  appropri¬ 
ate  system  of  difference  equations.  The  policy  im¬ 
provement  algorithm  is  applied  parametrically  lead 
ing  to  a  characterization  of  the  cost  function 
satisfying  the  functional  equation  of  optimality. 

If  this  cost  function  satisfies  an  appropriate  suf¬ 
ficient  condition,  the  associated  stationary  policy 
is  optimal. 

(U)  The  method  of  solution  is  illustrated  by 
solving  three  queueing  optimization  problems. 

These  problems  include  optimal  control  of  the  M/G/l 
queue  with  intermittent  service,  a  bulk  queueing 
version  of  this  same  problem,  and  control  of  the 
M/M/1  queue  with  selection  of  running  speed.  All 
of  these  problems  have  been  investigated  by  other 
authors.  Results  in  this  report  believed  to  be 
new  include  a  complete  characterization  of  opti¬ 
mal  policies  for  the  optimal  control  of  the  M/G/l 
queue  in  the  discounted  case,  the  extension  of  the 
optimal  control  of  the  bulk  queueing  problem  from 
instantaneous  to  general  service,  and  the  determina¬ 
tion  of  an  optimal  speed  selection  policy  for  the 
M/M/1  queue  without  solving  a  sequence  of  truncated 
problems. 


SECURITY  CLASSIFICATION  OF  THIS  PAGEfWhen  Data  Entarad) 


NWC  TP  5594 


CONTENTS 

Introduction .  ^ 

The  M/G/l  Queue  with  Removable  Server  .  6 

Existence  of  Stationary  Optimal  Policy  for  the 

Average  Cost  Case .  7 

Qualitative  Attributes  of  an  Optimal  Policy  for 

the  Average  Cost  Case .  n 

Quantitative  Results  Associated  with  an  Optimal 

Policy  in  the  Average  Cost  Case .  17 

Qualitative  and  Quantitative  Results  for  the  Discounted  Case  .  22 

Optimal  Control  of  a  Bulk  Queueing  System  .  46 

Existence  of  a  Stationary  Optimal  Policy  .  48 

Qualitative  Attributes  of  an  Optimal  Policy .  53 

Determination  of  an  Optimal  Policy . .  55 

The  M/M/1  Queue  with  Variable  Service  Rate .  66 

Existence  of  a  Stationary  Optimal  Policy  .  .  .  . .  66 

Qualitative  Attributes  of  an  Optimal  Policy . !  70 

Quantitative  Results  for  the  Linear  Holding  Cost  Case .  76 

Appendixes : 

A.  Functional  Equations  of  Optimality . 83 

B.  Glossary . 93 

References .  nri 


1 


NWC  TP  5594 


DIFFERENCE  EQUATIONS  AND  THE  OPTIMAL 
CONTROL  OF  SINGLE  SERVER 
QUEUEING  SYSTEMS 


by 

Frank  C.  Reed 

Stanford  University  and  Naval  Weapons  Center,  China  Lake 


1.  Introduction 

This  report  describes,  by  way  of  example,  the  use  of  difference 
equations  to  obtain  optimal  control  policies  in  single  server  queueing 
systems.  The  difference  equations  solved  in  this  report  give  explicit 
expressions  for  expected  discounted  costs  or  relative  costs  as  defined 
in  Howard  [5]  for  use  in  a  policy  improvement  algorithm  to  solve  the 
dynamic  programming  functional  equations  of  optimality.  Assuming  that 
a  stationary  optimal  policy  exists  the  solution  is  carried  out  in  the 
following  way: 

(i)  Determine  the  qualitative  attributes  of  an  optimal  stationary 
policy,  thus  restricting  the  family  of  stationary  policies  and  class 
of  difference  equations  one  must  consider. 

(ii)  If  necessary,  apply  the  policy  improvement  algorithm  and  deter¬ 
mine  the  difference  equation  solution  to  the  functional  equations  of 
optimality. 

(iii)  Show  that  the  policy  obtained  under  (ii)  satisfies  sufficient 
conditions  for  optimality. 
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A  classical  method  of  solving  queueing  optimization  problems  is  to 
concentrate  on  (i)  without  consideration  of  the  difference  equations 
involved.  Once  the  family  of  stationary  policies  has  been  suitably 
restricted,  one  may  then  construct  specialized  algorithms  for  arriving 
at  an  optimal  policy.  Since  it  is  possible  to  infer  qualitative  properties 
of  optimal  policies  by  consideration  of  (ii) ,  the  difference  equation 
approach  allows  for  flexibility  in  the  combined  use  of  (i)  and  (ii) 
in  determining  optimal  queueing  policies.  It  appears,  moreover,  that 
the  solution  of  the  difference  equations  involved  may  lead  to  highly 
efficient  computing  algorithms. 

To  illustrate  the  method,  optimal  control  policies  are  obtained  for 
three  single  server  queueing  systems.  Aside  from  the  difference  equation 
approach  the  solutions  depend  on  material  presented  in  Reed  [7].  That 
report  establishes  sufficient  conditions  for  both  the  existence  of  sta¬ 
tionary  optimal  policies  and  the  optimality  of  stationary  policies  m 
Markov  decision  processes  for  which  an  assumption  of  bounded  costs  is 
not  appropriate. 

In  the  discounted  cost  case  it  is  shown  that  if  costs  are  non-negative, 
a  stationary  optimal  policy  exists.  Sufficient  conditions  for  the  optimality 
of  a  stationary  policy  require  that  an  explicit  expression  for  the  difference 
equations  associated  with  the  policy  be  available.  Briefly  in  the  average 
cost  case  it  is  shown  that  if 

A(l) :  There  is  a  policy  for  which  the  average  cost  per  unit  time 

is  finite, 

A(2) :  There  exists  a  state  that  is  positive  recurrent  over  all 
policies , 

A(3) :  Relative  costs  for  all  policies  are  bounded  below, 
then  a  stationary  optimal  policy  exists.  Sufficient  conditions  for  optimality 


4 


NWC  TP  5594 


of  a  stationary  policy  require  explicit  expressions  for  relative  costs 
obtained  by  solving  the  appropriate  difference  equations. 

It  will  be  assumed  that  the  reader  is  familiar  with  the  assumptions, 
definitions,  notation,  and  precise  form  of  the  above  results  as  presented 

in  Reed  [7].  For  reference,  a  summary  of  basic  results  and  notation  in  that 
report  :  is  given  in  Appendix  B. 

Section  2  considers  the  control  of  an  M/G/l  queue  with  removable  server, 
when  the  optimization  criteria  are  the  minimum  expected  average  cost  per  unit 
time  and  the  minimum  expected  discounted  cost  over  an  infinite  horizon.  The 
average  cost  case  has  been  studied  by  Heyman  [4]  and  Sobel  [10].  The  solution 
presented  here  for  the  average  cost  case  provides  a  new  proof  for  the  exis¬ 
tence  of  a  stationary  optimal  policy.  Proofs  different  from  Heyman Ts  are 
given  to  restrict  the  class  of  stationary  policies  in  which  an  optimal 
policy  lies,  and  difference  equations  are  solved  to  express  the  average  cost 
per  unit  time  as  a  function  of  two  parameters.  This  function  is  easily 
minimized  and  Heyman fs  results  are  extended  to  include  rewards  and  a  more 
general  holding  cost  assumption.  For  the  discounted  case  investigated  by 
Heyman  [4],  Bell  [1],  and  Blackburn  [2],  the  stationary  optimal  policy  is 
obtained  using  the  difference  equation  approach  to  solve  explicitly  the 
functional  equation  associated  with  optimality.  It  is  then  shown  that 
this  solution  satisfies  sufficient  conditions  for  optimality.  The  final 
result  is  a  complete  characterization  of  all  optimal  policies,  without 
resorting  to  numerical  application  of  the  policy  improvement  algorithm 
or  stopping  rule  algorithms. 

Section  3  presents  two  versions  of  a  bulk  queueing  problem  which  are 

t 

both  generalizations  of  the  mail  truck  problem  presented  by  Ross  [8,  pp.  164 
ffj.  This  problem  is  solved  for  the  average  cost  case  and  allows  a  general 
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distribution  for  the  time  to  perform  bulk  service.  The  two  versions . 

result  from  different  coSt  structure  assumptions.  For  both  problems  it  is 
shown  that  a  stationary  optimal  policy  exists ,  "and  difference  equations  are 
used  to'  construct  a  policy  improvement  algorithm. 

;  ex  ; section  4  considers  the ’control  Of4 'the  M/H/l  queue  with  variable 
service  rate  that  was  originally  investigated  by -Crabiil  [3] ,  Kakalik  [6], 
Sabeti  [9] ,  arid  more  recently  for  closed  systems  by  Torbett  [11] •  For 
Crabiil 's  form  of  this  problem  it  is  shown ’that  a  stationary  optimal  policy 
exists  and  that  it  has  a  simple  form.  Unlike  Crabiil' s  proof,  this  proof 
avoids  truncation  of ’ the  original  problem.  'Using  difference  equations  the 
expected  average  Cost  per  unit  time  is  obtained  as  a  function  of  a  parameter 
which' describes  the  family  of  permissible  stationary  policies . ;  A  method  of 
determining  the  minimum  of  this  function  is  presented . 


Consider  the  situation  where  an  M/G/l  queue  is  controlled  by  turning 
the  server  off  and  on.  Customers  arrive  according  to  a  Poisson  process  with 
rate  X  >  0.  Service  times  are  non-negative,  independent  random  variables 
with  common  distribution  function  B.  .  It  is  assumed  that  the  mean  service 


t"1  -  / tdB(t) 
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and 

p  =  Ap-1  <  1. 

There  is  a  cost  R^  of  turning  the  server  on  when  he  is  idle,  and 
a  cost  R2  of  turning  the  server  off  when  he  is  active.  There  also  is  a 
cost  r^  per  unit  time  of  maintaining  the  queueing  system  when  it  is  idle, 
and  a  cost  r2  per  unit  time  of  operating  the  system  when  the  server  is 
active.  In  addition,  there  is  a  holding  cost  of  h  per  customer  per  unit 
time.  For  the  average  cost  case  there  is  a  reward  G  received  at  the 
completion  of  a  service.  Rewards  are  not  included  in  the  discounted  case. 

Decisions  are  made  at  the  time  of  service  completion,  or  if  the  server 
is  idle  decisions  are  made  at  the  time  of  customer  arrival.  In  the  former 
case,  the  server  may  either  remain  active  and  continue  service  or  he  may  shut 
down  and  go  idle.  In  the  latter  case  he  may  either  remain  idle  or  start  up. 
We  associate  k  =  0  with  the  decision  to  remain  idle  or  shut  down.  We 
associate  k  =  1  with  the  decision  to  remain  active  or  start  up. 

The  state  of  the  system  is  described  by  the  pair  (i,  j)  where  i 
indicates  the  number  of  customers  is  the  queue  (waiting  or  in  service) ; 
j  =  0  if  the  server  is  idle  and  j  =  1  if  the  server  is  active.  We  shall 
adopt  the  convention  that  for  any  realisation  of  the  stochastic  process 
associated  with  this  queueing  system  i  is  right  continuous  and  j  is  left 
continuous  with  respect  to  the  time  parameter. 

2.1.  Existence  of  Stationary  Optimal  Policy  for  the  Average  Cost  Case 

We  now  proceed  to  verify  assumptions  A(l),  A(2),  and  A(3)  of  Reed  [7] 
for  the  existence  of  a  stationary  optimal  policy.  At  this  point  we  must 

Rewards  were  not  considered  in  Heyman's  work  and  this  provides  a  minor 
extension  to  his  results. 
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place  some  restrictions  on  permissible  service  distributions. 


LEMMA  2.1. 

If  p2  =  /t2dB(t)  <  then  A(l)  is  satisfied. 


Proof.  We  consider  that  stationary  policy  fQ  which  always  provides 
service.  It  follows  from  Reed  [7]  and  Appendix  A  that  the  difference 
equations  associated  with  this  policy  are 


v. 

i 


I  vi+k-l 
k=0  1+R  1 


r  -At  (Xt)k 
Je  ~kT 


dB(t)  +  <j>f  y 
0 


,  Ahy- 

=  (hi  +  r2)y  +  — 2  G 


i  =  1, 


v0  "  V1  +  \,X  "  r2/X 

where  v^  =  0  allows  determination  of  4>f  . 

0  r0 

Setting 


(2.1) 


v. 

i 


bi  +  ci' 


substituting  in  (2.1),  and  interchanging  summation  and  integration  which 
is  permissible  in  this  case,  we  have 


c  = 


hy 


-1 


2(l-p) 


hy 


-1 


2(l-p) 


+ 


Xhy2 

2(l-p)2 


E (<J>f0  -  r2)y  1  +  G] 

5tp  ' 
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Since 

<(>f  =  Av1  +  r2  =  A  (b+c)  +  r2, 

A2hy2 

^  =  r2  +  hp  +  2(i_p)  -  XG-  (2.2) 


Since  y„  <  °°  we  may  set  M  in  A(l)  equal  to  <)>,.  . 

*0 

With  respect  to  the  optimal  cost  function  V  we  have 


V((i,0))  <  V  ((i,0))  =  R  +  bi  +  ci2, 
r0 

v((i,D)  <  v.  ((i,D)  =  bi  +  ci2, 
ro 

<,  • 

xo 

LEMMA  2.2.  Assumption  A(2)  is  satisfied. 

Proof.  We  must  show  there  exists  a  state  recurrent  over  all  policies  tt 
for  which  <j>  <  <be  .  We  define 

*-  fo 

x0  -  { <i,0) :  hi  +  r.  <  *.  +  JG}  i  {(0,0)...(i  ,0)) 

i  r0  u 

I1  «  {(i,l):  hi  +  r2  £  *f  +  AG}  =  {  (0,1)  . . .  (i^l)  }  . 

We  note  that  4>  +  AG  is  the  cost  per  unit  time  of  any  policy  tt  which 

services  customers  in  such  a  way  that  the  number  of  customers  held  without 
service  does  not  grow  without  bound.  For  any  such  policy  7T  for  which 
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<j>  +  AG  <_  (f >.  +  AG 

7T  £q 

the  set  l*o  ^  ^1  recurrent*  Since  1/X  <  °°,  any  policy  tt  will  consist 
of  a  sequence  of  services  possible  separated  by  idle  periods  of  finite  duration 
Moreover,  since  y-1  <  ir  will  give  rise  to  a  sequence  of  services  with 
finite  expected  times  between  service.  With  each  service  completion  the  pro 
bability  of  a  transition  into  IQ  U  1-^  (the  complement  of  Iq  U  1-^)  exceec*s 


l  fe  Xt  dB(t)  >  0 


where  kQ  =  max  [iQ  +1,  +  1].  It  follows  that  IQ  is  recurrent.  In 

going  from  Iq  U  I^  to  Iq  U  1^  the  state  (max  [iQ  +1,  i^  +  1]»1)  must 
be  entered;  hence  this  state  is  recurrent  over  all  ir  for  which  ^  £  <f>f  • 


LEMMA  2.3.  If  U2  K  °°>  M3)  holds. 


Proof.  To  prove  this  lemma  we  use  Reed  [7],  Theorem  4.3  and  Appendix  A. 
We  have 


C(i,0) 

C(i,0) 

C(i,l) 

C(i,D 

t(i,0) 


(0) 

(1) 

(1) 

(0) 

(0) 


hi  +  r1 
- 


=  R. 


-1 


Ahy 


=  (hi  +  r2)y  + 


1 

A 
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t(i,0)(1)  =  0 


(1>  =  P 


-1 


t(i,l)(0)  "  ° 


Since  q^(1)  =  t(i  i)  ^  =  we  must  verify  that  the  elimination  of 

trivial  sequences  as  defined  in  Reed  [7],  Section  3.2.2  implies  that  Condition 
1  holds.  All  trivial  sequences  have  non-negative  costs  associated  with  them 
and  the  average  cost  per  unit  time  cannot  be  reduced  by  including  them. 
Consequently,  we  may  assume  that  every  instantaneous  action  is  followed  by 
an  action  with  positive  expected  transition  time.  It  follows  that  Condition 
1  is  satisfied. 

Since  VU  <  00 ,  <t>f  <  03  and  the  set  S  of  Theorem  4.3  is  finite,  it 

±0 

follows  that  V  is  bounded  below  since  C^k)  are  bounded  below. 

We  summarize  these  results  in  a  theorem. 


THEOREM  2.4. 

If  u2  <  “,  there  exists  a  stationary  optimal  policy. 

2.2.  Qualitative  Attributes  of  an  Optimal  Policy  for  the  Average  Cost  Case 
* 

If  f  is  an  optimal  stationary  policy,  then  the  cost  function  V£* 

has  certain  properties  which  we  shall  now  investigate.  We  recall  the 
functional  equations  which  V£*  must  satisfy 

<J>£*  hi  +  r1 

Vf*(i,0)  -  min  [Vf*(i  +  1,0)  -  —  + - - - ,  Vf*(i,l)  +  Rj  i  =  0,1,2, . . . 
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V *(i,l)  =  min  I  l  Vf*(i  +  k-1,1)  /e  Xt  ^  d®(t) 

f*  k=0  1 


_1  _!  Ah^2 

+  (hi  +  r2)y  +  — jj  G, 


VfA(i,0)  +  R2] 


i  —  1 » 2 » • 


♦f*  .  r2 


Vf*(0,l)  =  min  [Vf*(l,l)  -  —  +  — >  vf*<°»°>  +  R2]‘ 


With  respect  to  these  functional  equations  we  prove  the  following  lemma. 

LEMMA  2.5.  If  +  R2  >  0,  then  for  each  i  there  are  only  three  possi- 
bilities , 


(i)  vf*(i,l)  =  vf*(i,0)  +  r2 


f* 


♦f*  .  hi  +  rl 


Vf*(i,0)  =  vf*(i  +  1,0)  -  —  +  - X 


(ii)  V„(i,l)  -  !  V  (i  +  k-l.l)  /e'Xtijr-  ■“<« 

1  k=0 

i  -1  Xhy2 

-+f*y  +  (hi  +  r2)p  +— --G 


♦f*  .  r2 


i  =  0,1,2, . 


i  —  0,1, ••• 


i  —  1,2 . 


Vf*(0.1)  -  Vf*(l,l)  --f+  x 


hi  +  r^ 

Vf*(i,0>  =  Vf*(i+1,0)  -  —  + - x - 


(iii)  Vf*(i,0)  =  Vf*(i,l)  +  Rx 


i  =  0,1,2,. 


i  ~  0,1, 
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00 

Vf*(i,D  =  I  V  (i  +  k-1,1)  /e"Xt-%f-  dB (t) 

k=0  Kt 

-1  -1  Xhy2 

-  +  (hi  +  r2)y  + — 2 - G  i  =  1,2,... 

tf*  ro 

vf*(0,l)  =  Vf*(l,l)  -  -J-  +  ~  . 

Proof.  The  only  other  possibility  is 

Vf*(i,l)  =  Vf*(i,0)  +  r2 
Vf*(i,0)  =  Vf*(i,l)  +  Rx 

so  that 


0  =  Rx  +  R2, 

a  contradiction. 

We  note  that  if  (i)  in  Lemmas  2.5  holds,  then 

f*(i,0)  -  0 


and 


f*(i,l)  =  0. 

Any  i  for  which  (i)  in  Lemma  2„5  holds  is  called  an  idle  integer.  If  (ii) 
in  Lemma  2,5  holds,  then 
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f*(i,0)  =  0 
f*(i,l)  =  1. 

Any  i  for  which  this  holds  in  called  an  indifference  integer.  Finally , 
if  (iii)  in  Lemma  2.5  holds,  then 

f*(i,0)  =  1 
f*(i,l)  »  1 

and  i  will  be  called  a  service  integer.  As  a  consequence,  if  i  is  the 
number  of  customers  in  the  queue,  then  the  determination  of  an  optimal  policy 
f*  is  equivalent  to  categorizing  all  integers  i  as  idle,  indifference,  or 
service. 

LEMMA  2.6.  The  set  of  i  for  which  (i)  in  Lemma  2.5  holds  is  bounded  for  f 

Proof.  Assume  the  set  of  i  for  which  (i)  holds  is  unbounded  and  let  N 
be  such  that  (i)  holds  and  hN  +  min(rj,r„)  >  <j>f  +  AG.  Also  let  SQ 

X  Z  Iq 

fCi  1):  i  >  N}.  If  service  is  never  performed,  <}>-»-  00  >  <J>  ,  a  contradiction. 

*  —  r0 

Thus,  service  periods  alternate  with  finite  idle  periods  and  Sq  is  accessible 

in  finite  time.  Once  Sq  is  entered  the  number  in  the  queue  never  drops 

below  N. 

It  follows  that 

<j>  +  AG  >  hN  +  min(r-,r0)  >  +  AG, 

1  1  *0 
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a  contradiction.  Hence  the  set  of  i  for  which  (i)  holds  is  bounded. 
LEMMA  2.7.  If  R  +  R2  >  0, 

-Rx  £  Vf*(i,l)  -  Vf*(i,0)  £  R2 

with  one  equality  holding  when  case  (i)  or  (iii)  of  Lemma  2.5  holds. 

Proof.  If  (i)  in  Lemma  2.5  holds, 

Vf*(i»l)  -  Vf*(i,0)  =  R2. 

If  (iii)  in  Lemma  2.5  holds, 

Vf*(i,l)  -  Vf*(i,0)  =  -Rr 

If  (ii)  holds,  then  from  the  functional  equations  defining  Vf*, 

Vf*(i,0)  £  Vf*(i,l)  +  Rx, 

Vf*(i,l)  <  Vf*(i,0)  +  R2, 

from  which  the  desired  result  follows. 

LEMMA  2.8.  There  exists  an  N  such  that  (iii)  of  Lemma  2.5  holds  for  all 
i  >  N. 
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Proof.  From  Lemma  2.6  it  follows  that  for  i  sufficiently  large 


K. 

V,*(i,l>  -  l  V  *(i  +  k-1,1)  /e"At-^-  dB(t) 
f  k=0  f 


-1  -1  m2 
-  +  (hi  +  r2)y  +  2  G* 


It  follows  from  Lemma  2.1  that 


-1  ,  -1 

Vf*(i  +  1,1)  -  Vf*(i,l)  =I^-i  +  iV"  + 


-1  Ahp2  (r2 

rv  I 


2(l-p) 


(1-P) 


Now  assume  (ii)  in  Lemma  2.5  holds. 


Vf*(i  +  1,0)  -  Vf*(i,Q) 


*f*  "  rl  h  . 
A  “  A  1 


so  that 


V  *(i  +  1,1)  -  Vf*(i  +  1,0)  +  [Vf*(i,0)  -  Vf*(i,l) ] 


hu-1!  hu'1  ,  XhP2  _  V*'1  ,  r2^~1  ,  V"'  G 


p(l-p)  1-p 


2(l-p) 


2  “  p(l-p)  1-P  P  1-P 


From  Lemma  2 . 7  we  have 


Ri  + 


,  .  hy-1i 

L2  -  p(l-p) 


+  k. 


For  i  sufficiently  large  this  leads  to  a  contradiction. 

Now  applying  Lemma  2.6  we  let  N1  be  the  maximum  integer  for  which 
(i)  in  Lemma  2.5  holds.  If  there  is  no  idle  integer  we  set  ^  =  -1. 
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Applying  Lemma  2.8  we  let  N2  be  the  smallest  integer  greater  than  or 
equal  to  for  which  (iii)  in  Lemma  2.5  holds.  From  Lemma  2.5  it  follows 

that  <  N2  and  i  with  <  i  <  N2  are  indifference  integers.  We 
shall  always  be  concerned  with  a  well-defined  set  of  closed  states  which 
communicate  with  (N^,  1)  or  (0,  1)  if  =  -1. 

Since  (N2  +  j,0)  for  j  >  0  are  transient  with  respect  to  this 
closed  set,  we  may  specify  that  decision  1  is  made  in  states  (N2  +  j,0). 
Thus  all  i  >_  N2  may  be  regarded  as  service  integers.  Since  the  states 
0*1  “  j*®)  an^  0*i  -  j>l)  for  j  >  0  are  transient  with  respect  to 
this  closed  set,  we  may  arbitrarily  specify  that  decision  0  is  made  in  these 
states.  Thus  all  i  <_ may  be  regarded  as  idle  integers.  We  summarize 
these  remarks  in  the  following  theorem: 


THEOREM  2.9. 

A  stationary  optimal  policy  is  characterized  by  two  integers, 
and  N2  with  -1  <_  <  N2  where  all  i  _<  are  idle  integers,  all 

i  >_  n2  are  service  integers,  and  all  i  with  <  i  <  N2  are  indifference 
integers . 

2.3.  Quantitative  Results  Associated  with  an  Optimal  Policy  in  the  Average 
Cost  Case 

We  proceed  to  determine  optimal  values  for  and  N2> 

LEMMA  2.10.  Let  f(N^,  Nj)  be  a  stationary  policy  with  and  N2 

defined  in  Theorem  2.9.  Then  the  average  cost  per  unit  time  associated  with 
f  is  given  by 
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with  the  result 


B 


0 


+ 


_h_ 

2A 


Setting 


V(i,l)  =  A  +  Bji  +  C^i 


B  toT1  ,  <*2  -  »>-'1 

B1  "  2(1  -  p)  +  1  -  p 


-  G  Ahy, 


2(1  -  p) 


C.  =-^ 


-1 


1  2(1  -  p)  ' 


Since 


V(N2,0)  =  V(N2,1)  +  Rx 

and 


V(NltO)  =  V(N1,1)  -  R2> 

we  have 


VB0  -  V  -  A2  '  (C1  -  C<X  -  E2 

"2<B0  -  Bl>  -  A2  ‘  (C1  -  °0)N2  +  B1 
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and 

h  (N*  -  nJ)  _  Rx  +  R2 

B0  “  ®1  =  2X(1  -  p)  N2  -  N1  +  N2  -  N1  * 


Since 

d>  rl  r2y  1  Ah;i2  ,  h(l  -  2p)  .  G  _ 

B0  "  B1  =  A (1  -  p)  ”  X  "  1  -  P  2(1  _  p)2  2X(1  -  p)  1  "  P 

(2,3)  follows. 

THEOREM  2.11. 

A  stationary  optimal  policy  is  characterized  by  a  single  integer  N. 
If  N  =  0,  all  i  are  service  integers  and  fp  is  optimal.  If  N  >_  1, 
i  =  0  is  an  idle  integer,  all  i  >_  N  are  service  integers,  and  i  for 
which  0  <  i  <  N  are  indifference  integers. 

Proof.  For  stationary  policies  described  by  >_  0  and  N2,  we  have 


f(Nn  ,N„)  ,  X(1  -  p)  (R-i  +  R2) 

__ i — £_  =  £  +  - - i— -  > 


(N2  -  Nj) 


and  <|> 


f(Nx,N2) 


minimized  by  setting  N1  =  0.  It  is  easily  shown  that 


6  /n  .is  convex  in  N„  and  the  optimal  value  of  N2  is  one  of  the  two 
Tf(0,N2)  ^ 


integers  adjacent  to 
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2A(1  -  p) (Rx  +  R2)]l/2 


Defining  <a)  to  be  the  smallest  integer  greater  than  or  equal  to  a, 
the  minimum  value  of  4>  _  -  .is  given  by 

*f(NrN2)  =  01111  [+f0»  *f(0,<N2>  -  1»  +f.(0,<fi2>)1* 


If  the  minimum  is  attained  for  the  first  entry,  N  *  0.  If  the  minimum 
is  attained  by  the  second  or  third  entry,  then  N  =  <nJ  -  1  or  N  =  <  N0>  , 
respectively. 

These  results  are  easily  extended  to  include  different  holding  rates,  h^ 
when  the  server  is  idle  and  h2  when  active.  In  this  case 

X2h 

*f (NrN2)  =  rl(1  "  p)  +  r2p  +  h2p  +  2U'2'-2p) 

(N2  -  1  +  N  )  1(1  -  p)(R.  +  R„) 

+ - 2 -  +  V1  -  ’»  + - »2\ 


Nx  =  0, 


-  [21(1  -  p)(R1  +  R2)-Jl/2 

n2  "  h2p  +  hxxi  -  p)  J 
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lualitative  and  Quantitative  Results  for  the  Discounted  Case 


Since  G  «  0,  C±(k)  >_09  and  it  follows  from  Reed  [7]  Theorem  3.12 
that  there  exists  a  stationary  ^-optimal  policy.  Our  plan  is  to  determine 
a  ft-optimal  improvement  policy  as  defined  in  Reed  [7]  Section  3.2.2  and 
then  impose  sufficient  conditions  that  this  policy  be  optimal.  The  final 
result  is  a  complete  characterization  of  all  optimal  policies,  without 
resorting  to  numerical  application  of  policy  improvement  or  stopping  rule 
algorithms.  As  a  consequence  both  qualitative  and  quantitative  results 
will  be  obtained  together. 

We  define  V0(i,j)  as  the  total  expected  discounted  cost  over  an 
p 

infinite  horizon  given  the  process  begins  in  state  (i,j).  From  Reed  [7] 
and  Appendix  A,  it  follows  that  the  functional  equations  associated  with  an 
optimal  stationary  policy  are  as  follows: 


Vg(i,0)  =  min{~V3(i+l90)  +  (hi  +  rJ/CA+p)  ,  Vg(i,l)  +  R^,  i  >  0 

A  r2 

Vg(0,l)  =  min{V3(0,0)  +  R£,  Vg(l,l)  + 


Vg(ifl)  =  min{Vg(iv0)  +  R2 


l  V  (i+k-1,1)  /^e-(X+S)tdB(t)  +  (hi  +  r2)  <1-Be^ 
k=0  p 


+  (l-B(B)  -  0  /te'0tdB(t))} 

B 


where  B(B)  =  /  e  dB(t). 
0 
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We  shall  refer  to  the  following  equations  in  characterizing  integers 
associated  with  various  types  of  policies: 


A  hi  +  r. 

vs(i-0)  -  we  Vi+1-0)  -  Tfr1 


i  >  0 


(2.4) 


V0,1)  -  m  ve(1,1)  - 


(2.5) 


V  (i,l)  -  l  V  (i+k-1,1 )/  e"(X+6)t 


dB(t) 


(hi  +  r2)-^i  +  ^  (1-B(g)  -  6/te_etdB(t)  i  >  1 


Vg (i, 0)  =  Vg(i,l)  +  Rx 


(2.6) 


Vg(i, 1)  =  V  (1,0)  +  R2 


(2.7) 


Any  i  for  which  (2,4)  and  (2.7)  hold  is  defined  as  an  idle  integer*  Any  i 
for  which  (2.5)  and  (2.6)  hold  is  defined  as  a  service  integer,  and  any  i 
for  which  (2.4)  and  (2.5)  hold  is  defined  as  an  indifference  integer. 

Since  CL  (k)  ^  0,  the  elimination  of  trivial  sequences  implies  that  there  are 
no  integers  for  which  (2.6)  and  (2.7)  hold. 

We  consider  solutions  to  (2.4)  and  (2.5)  of  the  form 


Vg(i,0)  =  Ax  +  Bji  +  KjMB)-1 


VgU,i)  =  a2  +  B2i  +  K^e)1, 
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and  G(B)  satisfies 


G(e)  =  B(e  +  a(i  -  g(b))). 


Defining 


and 


r^J 

H(6)  = 


hB(B) 

su  -  b(b)) 


the  solution  for  (2.4)  and  (2.5)  (except  for  i  =  0)  is  specified  by 

A1  =  rl/3  +  Xh/&2 

A2  =  r2/B  +  Ah/B2  -  H(3) 

=  B2  =  h/B, 

and  K  and  K»  are  determined  by  boundary  conditions  that  hold.  Since 
1  ^ 

B  is  fixed,  we  shall  set  A(B)  -  A,  G(B)  =  G,  and  H(B)  =  H. 
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THEOREM  2.12. 

If  H  1  1  HA(l-G) / (1-AG)  +  R  +  R2,  that  policy  for  which  all 

i  >,  0  are  indifference  integers  is  a  $-optimal  improvement  policy. 

Proof.  In  this  case  we  solve  (2.4)  and  (2.5)  for  all  i,  where  the  equation 
in  Vg(0,l)  imposes  a  boundary  condition  from  which  K2  is  determined. 
Imposing  this  boundary  condition  gives 

K2  =  H(1-A)/(1-AG) 

and,  since  there  is  no  boundary  condition  on  K^,  we  set  =  0.  The 
above  solution  is  optimal  if  for  all  i 

Vg(i,0)  <  Vg(i,l)  +  Rx 

Vg(i,l)  <  vg(i,o)  +  R2  . 

In  the  former  case  we  have  for  all  i 

A1  +  Bji  <.  A2  +  B2i  +  K2Gl  +  R1, 


X  •  6  •  y 

h  ±  H(1  -  oEigt  g1)  =  H- 

In  the  latter  case  we  have  for  all  i 
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A2  +  B2i  +  I^G1  _>  Ax  +  B1i  +  R2 


i.e. , 


*1 1  -1”  *<1  -  o^gT  g1)  +  R1 +  r2  •  “*aEx«  +  Ri +  V 


If  ^  <  h,  there  will  be  an  n  such  that 


<  H(1  - 


(1-A)  ri. 

(1-AG)  b  ' 


for  all  i  ^  n.  From  the  policy  improvement  algorithm  the  total  discounted 
cost  will  be  reduced  by  making  all  i  _>  n  service  integers. 


For  ^  <  H  we  consider  the  solution  of  the  transcendental  equation 


rp1  =  H(l-Gx) 


giving 


log(— ) 


(2.8) 


We  also  set  n  =  (x>,  where  <x  >  is  the  smallest  integer  greater  than  or  equal 
o 

to  X. 


THEOREM  2.13. 


If  ^  <  H  and  Ri  +  R2  i  ^i-AG) ’  then  that  policy  which  makes  a11 
i  >  n  service  integers  and  all  non-negative  i  <  nQ  indifference  integers 

is  a  g-optimal  improvement  policy- 
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Proof.  In  this  case  we  solve  (2.5)  for  all  i,  (2.4)  for  all  i  with 
0  £  i  <  nQ,  and  (2.6)  for  all  i  >_  nQ.  The  solution  requires  that 

Vv0>  -  Vv«  +  V 


which  gives 


Kx  -  A  °(*1  -  H) (1 


(1-A) 

(1-AG) 


) 


and 


K 


2 


«  ft 


(1-A) 
(l-AG)  * 


For  this  solution  to  be  optimal  we  must  have 


Vi>0>  -  ^  V1+1-°>  i -Tirr  • 


i  >  n. 


This  is  equivalent  to 


i  X  hi  +  ri 

A2  +  B2i  +  K2G  +  R1  “  X+8  (A2  +  B2i  +  B2  +  K2G  +  Rl)  1  TTP 


which  reduces  to 


il-.  <  HU-G1).  i  >  n 

±  —  o 

Since  H(l-G  )  is  increasing  in  i,  and  by  definition  this  inequality  is 
satisfied  for  i  =  nQ,  it  is  satisfied  for  all  i  _>  nQ. 
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Also  for  i  >_  nQ  we  must  have 


V  (i,l)  <  V  (1,0)  +  r2  =  V  (1,1)  +  Rx  +  r2. 


or  equivalently 


0  <  ^  +  r2 


i  > 


which  is  satisfied. 

For  i  <  nQ  we  must  have 


Vg(i,0)  <  Vg(i,l)  +  Rr 


i  •  e  •  y 

Ax  +  B1i  +  KjA-1  <_  A2  +  B2i  +  K2GX  +  R1 . 

Substitution  for  and  K2  yields  the  equivalent  condition 

♦,  ,  H  -  HG1  (1-AI(1-(AG)°0  II 
1  (1-AG)(1-A  0  ) 


we  add  and  subtract  HG 
equivalent  condition  is 


to  the  right  hand  side  of  (2.9)  above 
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no"^ 

^  >.  H(1  -  G  °  ) 


HG1 - U  +  AG  + 

nf)  1 
(1-A  0  ) 


+  (AG) 


v1-1 


“G  '  (1  +  A  +  . . .  +  A  )] 


-  H(1  -  G  °  )  -  HG1 - (1-A?  [i  -  0°°  1  1  + 

d-/0'1) 

nQ-i-2  n-i-2 

+  AG(1  -  G  )+...+  (AG)  U  (1-G) ] , 


where  the  second  expression  on  the  right  is  non-negative  for  all  i  <  nn-l 


.n0-1 


-  “0 


Since  ^  >  H(1  -  G  ),  the  desired  result  follows 


Also  for  i  <  nQ  we  must  have 


Vo  (i » 1)  <  Vft(i,0)  +  R„ 


i « 6  •  y 


-  H)  (1  -  A  0  )  <  ^  +  R2  -  HG1-^^-  (1  -  (AG)n°  S. 


Since 


R  +  R  >  tt  d  ^  >  tt  ( i~A)  _i 
R1  +  R2  -  “(l-AG)  -  n  (l-AG)  6  » 


the  above  inequality  holds  if 


dx  -  H)(l  -  An°  ±)  <  H  (AG) n°  V, 
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i.e.,  if 


t|)  <_  H  +  H 


(1-A) (AG)  °  G1 
n  -i  * 

(1-AG)  (1-A  ) 


Since  by  assumption  £  H,  the  above  inequality  is  satisfied  if  i  <  nQ. 


Corollary  2.14.  If  ^  <  H  and  ^  >  0,  nQ  >  0. 


Proof-  The  proof  is  immediate  from  the  solution  for  ^  in  (2*9)  and  the 
fact  that  nQ  >_  x  >  0. 

If  R  +  R  <  H( 1-A) /(1-AG),  it  may  be  possible  to  introduce  idle 
X  2 

integers  and  obtain  a  policy  improvement.  Toward  this  end  we  consider 
the  following  transcendental  equation  in  y: 


(H  -  ifij)  [(1-A)  (1- (AG) y  -  Gy (1-AG)  (1-Ay) )  ]  =  (Rj.  +  R^(l-AG)Gy  (2.10) 
and  define 


nx  =  <y>, 

providing  (2.10)  has  a  solution.  A  sufficient  condition  that  (2.10)  has  a 
solution  is  contained  in  the  following  lemma. 

Lemma  2.15.  If  ^  <  H,  there  exists  an  nx  such  that 

30 


NWC  TP  5594 


(H  -  ^HU-AMMAG)1)  -  Gi(l-AG)(1-Ai)]  >  +  R^l-AOC*  i  >  ^ 

(H  -  ip^)  [  (1-A)  (l-(AG)1)  -  G1(i-AG)(i-A1)]  <  +  R^  (l-AG)G1  0  <_  i  <  n 

Proof,  Since 


(l-A)d-(AG)1)  -  Gi(l-AG)(1-Ai) 

=  (1-A)  (1-AG)  {1  -  G1  +  AGd-G1*1)  +  ...  +  (AG)i-1(l-G)]  _>  0 

and  H  -  >  0,  both  sides  of  the  inequality  above  are  non-negative.  The 

right  hand  side  of  (2.10)  is  strictly  decreasing  in  i.  Moreover, 

(1-A)(1-(AG)1+1)  -  G*  ^(1-AG)  (1-Ai+1)  -  (l-A)d-(AG)1)  -  Gi(l-AG)(1-Ai) 

=  Gi(l-AG)(1-G)(l-Ai+1)  >  0, 

so  the  left  side  of  (2.10)  is  strictly  increasing  in  i.  Since  the  left 
side  of  (2.10)  is  0  for  y  =  0  and  the  right  side  of  (2.10)  is  >0  for 
y  =  0,  the  result  follows. 

We  now  prove  a  somewhat  stronger  form  of  Theorem  2.13. 

THEOREM  2.16. 

If  ip^  <  H  and  ng  <  n^,  then  that  policy  which  makes  all  i  n^ 
service  integers  and  all  non-negative  i  <  nQ  indifference  integers  is  a 
6-optimal  improvement  policy. 

Proof.  The  proof  is  exactly  the  same  as  in  Theorem  2.13  until  we  check  if 
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v6(i,i)  <  Vp(i,0)  +  r2. 


i.e. ,  for  i  <  n. 


-  H)  (1  -  A  )  <  \  +  R2 


(2.11) 


From  Lemma  2.15  with  ng  <  n^ 


(H  -  *  )[(1-A)(1-(AG)  ) 


nn-i  no-i 

G  0  (1-AG) (1  -  A  U  ) ] 


<  (R^  +  r2)U-ag)g 


(2.12) 


Inequality  (2.11)  may  be  rewritten  as 


^  —  j_ 

-a"0  1(H-i|»1)(1-A  °  )  (1-AG)  <  G  °  (Rx  +  R2)  (1-AG)  -  G  °H(1-A)  (l-(AG)  °  ) 


Adding  and  subtracting  (H-^)  (1-A)  (l-(AG)  °  )  to  the  left  hand  side  of 

(2.11)  yields 


(H— 1^1)  [  (1-A)  (l-(AG)  °  )  -  6°  (1-AG)  (1-A  )]  -  (H-^)  (1-A)  (l-(AG)  ) 


<  Gn°  X(Rx  +  R2)(1-AG)  -  G  °H(1-A) (l-(AG)  °  ). 


From  (2.12)  above  this  inequality  will  be  satisfied  if 


(IMij)  >.  G  H 
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which  is  true  by  definition  of  np. 

We  now  consider  that  policy  which  makes  i  =  0  an  idle  integer,  i  n 
service  integers,  and  all  positive  i  <  n  indifference  integers. 

THEOREM  2.17. 

If  <  H  and  n^  <  n^,  then  that  policy  which  makes  all  i  n^ 
service  integers,  all  positive  i  <  n^  indifference  integers  and  i  =  0 
an  idle  integer  is  a  ^-optimal  improvement  policy. 

Proof.  In  this  case  we  solve  (2.4)  for  all  i  such  that  0  <  i  <  n^, 

(2.5)  for  all  i  >.  1,  (2.6)  for  all  i  >  n^  and  (2.7)  for  i  =  0.  The 
solution  is  of  the  form 

vg(i,0)  =  A1  +  Bji  +  KjA"1 

Vg(i,l)  -  A2  +  B2i  +  ^G1, 

where  and  K2  are  determined  by  the  boundary  conditions 

Vg(nr0)  =  V  (n^l)  +  R1 
Vg(0,l)  =  V  (0,0)  +  r2, 


which  implies 
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n  nl 

Kx  -  K2(AG)  1  =  A  X(i|»1-H) 


-K1  +  K2  =  ^  +  R2  -  (^-H)  , 


i.e. , 


(Rx  +  R2)  +  (1-A  x) 


1  -  (AG) 


nl  nl  nl 

(AG)  1(R1  +  R2)  -  A  X(l-G  X)(H-t|«1) 


1  -  (AG) 


For  optimality  we  must  have  for  i  ^ 


yM>  ixTe  Vi+1>°>  +^r 


which  reduces  to  the  condition 


(H-^)  >. 


I  (Rx  +  R2)  +  (H-i|)1)  (1-A  1)]G±(1-AG) 


(1-A)  (1- (AG)  ■L) 


Clearly  the  right  hand  side  is  decreasing  in  i,  and  from  Lemma  2.15  the 
inequality  is  satisfied  for  i  =  n^  so  that  it  holds  for  all  i  >_  Oj. 
Also  for  i  n^  w.e  must  have 


Vg(i'l)  1  Vg(i’0)  +  R2  * 
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Since 


V  (i,0)  =  V  (1,1)  +  R  , 


this  condition  is  satisfied* 


For  i  <  n^  we  check  if 


Vg(i,0)  <  V0(i,l)  + 


which  after  some  algebraic  manipulation  results  in  the  condition 


n„-i  n 


(H-i/»1)[(l-A  1  )  (l-(AG)  h  -  G1(l-  (AG)  1  )  (1-A  X)  ] 


i  nl'i 

<  G1(R1  +  R2)(1-(AG)  X  ). 


It  may  be  shown  that  for  0  <_  i  £  n^,  the  coefficient  of  H  -  ^  is 
negative .  Recalling  that 


(H-*j)  < 


n-1 

^  +  R2)G  (1-AG) 


ni-1 

Tll“1 

n1~1  ’ 

(1-A) (1- 

(AG) 

1  ) 

-  G  (1-AG) 

(1-A  1  ) 

optimality  will  be 

satisfied  if 

nl" 

i 

G  1 

(1-AG) 

n  -i 

n,-l 

ni"1 

(1-A) (1- 

(AG) 

1  ) 

-  G  (1-AG) 

(1-A  1  ) 

I1l“i 

G1(l- 

(AG)  1  ) 

n-.-i  n  n.-i  n. 

(1-A  ) (l-(AG)  ±)  -  G  (l-(AG)  ) (1-A  1) 


35 


NWC  TP  5594 


which  results  in 


n  -i  n--l  n  -1  n.,-1  ni~l  ni 

^(l- (AG)  1  ) [ (1-A) (l-(AG)  1  )  -  G  1  (1-AG)(1-A  )  +  G  (1-AG)(1-A  )] 


n. 


n.-l  n  -i 

>  G  1  (1-AG)(1-A  X  ) (1- (AG)  x). 


Since 


n,-l  n-— 1  ti.  "d 

(1-A)(1-(AG)  1  )  -  G  1  (1-AG) (1-A  -1+A  i)  =  (1-A)(1-(AG)  ), 


the  condition  for  optimality  will  be  satisfied  if 


n.-i 

G1(1-(AG)  ) (1-A) 


n,-l  n--i 

G  1  (1-AG) (1-A  X  ) , 


i.e.,  if 

n..-1-i  n..-i-2  n^-i-2 

Gid-AG)(l-A)Il-G  1  +(AG)  (1-G  )  +  ...  +  (AG)  (1-G)  ]  >  0. 

Since  this  inequality  holds  for  0  <  i  £  n^-1,  the  result  follows. 

For  0  <  i  <  n.^  we  check  if 


Vg(i,l)  <  Ve(i,0)  +  R2, 


which  reduces  to  the  equivalent  condition 
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n  -i 


n-“i 


(H-^HU-A  1  )(l-(AG)i)  -  Gi(l-(AG)  1  )  (1-A1) 


i  nri  ni  1 

2  -(Rx  +  R2)[l  -  G1  +  A  G  ±(1-A  )]. 


One  may  show  that 

n,-i  ,•  *  n,-! 

(1-A  ) (l-(AG)1)  -  G1(l- (AG)  1  ) (1-A1) 

=  (1-A) (1-AG) [1-G1  +  AG(1-G1_1)  +  ...  +  A1_1Gi'X(l-G) 

+  A(l-Gi+1)  +  A^d-G1)  +  ...  +  AiGi_1(l-G2) 

« 

n  -i-1  n1-l  n1 -i  n  -2  n,-2 

+  A  (1-G  )  +  A  G(l-G  1  )+...+  A  1  G 


i-1 


n 

(1-G 


)]. 


It  follows  that  the  left  hand  side  of  the  above  inequality  is  non-negative  and 
the  associated  condition  for  optimality  is  satisfied. 

For  i  =  0  we  check  if 


■V°*1) 


m  V1-15 


2 

A+  $ 


which  is  equivalent  to 

(H-^)(l-A  1)  (1-AG)  -  (1-A)  (1- (AG)  1)H  <_  -(1^  +  R2)  (1-AG)  , 

i.e. , 
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-G  1(H-^1)(1-A  1)(1-AG)  +  G  1(1-A) (l-(AG)  1)H 

nl  ni 

+  (H— t^1)  (1-A)  (l-(AG)  )  -  H(l-A)  (l-(AG)  ) 

ni  nl 

>_  G  i(l-AG)(R1  +  R2)  -  i|^(l-A)  (l-(AG)  A) . 

From  Lemma  2.15  and  the  definition  of  n^  the  condition  for  optimality  will  be 
satisfied  if 


H(l-G  1)  < 

Since  n^  <  n^,  this  condition  is  satisfied. 

THEOREM  2.18. 

& 

If  n  =  nQ  =  ni>  'I'l  <  H  and 

*  * 

(H-^)(1-An  )  (1-AG)  -  H(l-A)  (1-  (AG)n  )  >  +  R£) (1-AG) , 

the  policy  of  Theorem  2.16  is  a  g-optimal  improvement  policy.  If 
n  =  ng  =  ni»  <  H  and 

*  * 

(H-^)(l-An  )  (1-AG)  -  H(l-A)  (l-(AG)n  )  £  “(R^  +  R2)  (1-AG)  , 

the  policy  of  Theorem  2.17  is  a  8-optimal  improvement  policy. 

Proof.  In  the  proof  of  Theorem  2.16  we  note  that  the  argument  for  optimality 
goes  through  for  nQ  <_  n1  providing  we  restrict  our  attention  to  i  >  0. 
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For  i  =  0  we  must  have 


(H-tP1)  (1-A  °)  (1-AG)  -  H(l-A)  (l-(AG)  °)  >  -(^  +  R2>  (1-AG)  . 
In  the  proof  of  Theorem  2.17  we  have  the  condition 


nl  n1 

(H-i^Xl-A  1)  (1-AG)  -  H(l-A)  (l-(AG)  1)  <  -(^  +  R2)  (1-AG) . 


* 

With  n  =  iIq  =  at  least  one  of  these  results  must  hold. 

We  now  prove  two  additional  theorems  which  allow  a  complete  character¬ 
ization  of  all  optimal  policies. 


THEOREM  2.19. 

If 

max{H^A-G>  +  Ri  +  r2,  h}  <  ^  <  h  +  r1  +  r2, 

then  that  policy  for  which  i  =  0  is  an  idle  integer  and  all  i  j>  1  are 
indifference  integers  is  a  3-optimal  improvement  policy. 

Proof.  In  this  case 

ve(i,0)  -  Ax  +  Bll 
V6(i,l)  -  A2  +  B2i  +  K/, 
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where  K2  is  determined  from 

Vp(0,l)  =  Vg(0,0)  +  r2, 
so 

K2  ‘  H  -  +  »1  +  V 

For  optimality  we  must  have  for  i  >  0 

vg(i,0)  <  Vg(i,l)  +  Rx, 

i  *  6  •  ^ 

-  H  >_  ~[H  +  +  R2  ~  (2.13) 

Since 

H  _<  H  +  Ri  +  R^* 

the  left  side  of  (2.13)  is  non-negative  and  the  right  hand  side  of  (2.13) 
is  non-positive. 

For  i  _>  0  we  must  have 

VgCM)  <  vp(i,o)  +  r2 


i.e.  9 

^  -  H  <  [^  -  H  -  (Rx  +  R2)]G1  +  Rx  +  R2. 

This  inequality  is  immediately  satisfied  for  i  =  0,  and  for  i  >  0  we  have 
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(if^-H)  (l-G1)  <  (Rx  +  R2)  (l-G1)  , 


i.e. 


» 


^  £  H  +  R±  +  R2, 

which  holds  by  assumption. 

Finally  we  must  check  if 

V°’D  -  i+e  V1,u  +  we  • 


i.e 


> 


1 


.  HA(l-Q) 
-  (1-AG) 


+  R. 


+  r2, 


which  holds  by  assumption. 


THEOREM  2.20. 

If  j>  H  +  R^  +  R2 ,  then  that  policy  for  which  all  i  0  are  Idle 
integers  is  a  $-optimal  improvement  policy. 


Proof.  In  this  case 


Vg(i»0)  =  A1  +  Bji, 
Vg(i,l)  =  Ax  +  B^i  +  R2. 
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We  now  prove  a  theorem  which  allows  us  to  replace  the  expression 
3 -optimal  improvement  policy  by  3-optimal  policy  in  all  the  preceding 
theorems. 


THEOREM  2.21. 

All  3-optimal  improvement  policies  of  Theorems  2.12,  2.13,  2.16,  2.17, 
2.18,  2.19,  and  2.20  are  3-optimal  policies. 

Proof.  Since  0  <  G  <  1,  Vg(i,j)  _<  Ki  for  some  K  >_  0,  all  i  0  and 
j  =  0  or  1  in  all  theorems  in  question.  It  follows  from  Reed  [7]  Theorem 
3.17  that  the  implied  policies  are  optimal  if 

U  / tdB(t)  <  «>. 

Since  we  have  assumed  throughout  this  investigation  that  X  >  0  and 
p  =  Xp  ^<1,  the  result  follows. 
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At  this  point  we  define  three  basically  different  policies. 

Definition;  A  policy  of  type  A  is  determined  by  an  integer  nQ  such  that 
all  i  _>  nQ  are  service  integers  and  all  non-negative  i  <  ng  are  in¬ 
difference  integers.  For  nQ  «  <»,  all  integers  are  indifference  integers. 

Definition:  A  policy  of  type  B  is  determined  by  an  integer  n^  such  that 
all  i  >  are  service  integers,  all  positive  i  <  n^  are  indifference 

integers  and  0  is  an  idle  integer.  If  n^  =  °°,  all  i  1  are  indifference 

integers  and  i  =  0  is  an  idle  integer. 

Definition:  A  policy  is  of  type  C  if  all  i  >  0  are  idle  integers. 

We  now  give  a  theorem  which  completely  characterizes  all  possible 
optimal  policies. 

THEOREM  2.22.  A  stationary  optimal  policy  exists  for  any  combination  of 
queueing  system  parameters,  and  the  appropriate  policy  is  given  in  the 
following  table: 
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Conditions 

Optimal  Policy 

>  H  +  Rx  +  R2 

C 

max{  (<_Ag>  +  +  R2,  H}  <  ^  <  H  +  Rx  +  R2 

B  (n^  «  00 ) 

HA(l-G) 

H  -  *1  -  (1-AG)  +  R1  +  R2 

A  (nQ  =  “) 

<  H  and  <  n^,  or  <  H,  and 

no  no 

(H-^Kl-A  U)  (1-AG)  -  H(l-A)  (l-(AG)  U) 

> 

/-N 

13 

O 

A 

8 

>.  -(Rj_  +  R2)  (1-AG) 

1 

^<11  and  ^  <  nQ,  or  <  H,  n^  =  n^,  and 

nl  nl 

(H— 1^1>  (1-A  i)(l-AG)  -  H(l-A)  (l-(AG)  1) 

B  (n^  <  00 ) 

<  -(R1  +  R2) (1-AG) 

For  ii  H  the  results  above  are  identical  to  Blackburn  [2]  Chapter 
2  Lemmas  6  and  7.  For  <  H  the  above  results  provide  a  characterization 
of  optimal  policies  different  from  Balckbum's. 

It  should  be  noted  that  we  have  derived  closed  form  expressions  for 
all  expected  discounted  costs  given  any  starting  state*  In  particular  for 
type  A  policies 
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Optimal  Control  of  a  Bulk  Queueing  System 


Consider  a  single-server  queueing  system  that  is  controlled  by  turning 
the  server  on  and  performing  a  bulk  service,  after  which  he  is  turned  off. 
Customers  arrive  according  to  a  Poisson  process  with  rate  \  >  0.  Once 
bulk  service  begins,  all  customers  in  the  queue  at  the  time  of  service  initia¬ 
tion  are  served.  Bulk  service  times  are  non-negative,  independent  random 
variables  with  common  distribution  function  B.  It  should  be  stressed  that 
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B  In  no  way  depends  on  the  number  of  customers  served  in  bulk,  and  all 
customers  beginning  service  together  are  finished  together.  Customers 
arriving  while  the  service  facility  is  busy  from  a  new  queue;  consequently, 
a  bulk  service  may  end  with  a  queue  of  any  length. 

We  shall  also  assume  that 

p  ■*"  =  /tdB(t)  <  “ 

and 

2 

P2  =  / 1  dB(t)  <  oo. 

Decisions  are  made  at  the  time  of  service  completions,  or  with  the 
arrival  of  a  customer  if  the  server  is  not  busy.  Since  it  is  assumed  that 
the  server  is  shut  down  (at  least  momentarily)  at  the  completion  of  service, 
the  decision  at  all  permissible  decision  times  is  whether  to  remain  idle 
or  provide  service  for  customers  currently  in  the  queue.  We  let  k  *  0  imply 
the  decision  to  remain  idle  and  we  let  k  -  1  imply  the  decision  to  provide 
service. 

The  state  of  the  system  at  a  decision  time  is  i,  the  number  of  people 
in  the  queue  immediately  after  an  arrival  (when  the  server  is  idle)  or 
immediately  after  completion  of  a  bulk  service.  As  in  Section  2,  i  is 
right  continuous  with  respect  to  the  time  parameter.  The  only  exception  is 
that  i  is  discontinuous  at  decision  points  for  which  k  =  1  when  y  ^  =  0. 

There  is  a  fixed  charge  R  of  providing  the  bulk  service,  independent 
of  the  number  in  the  queue.  There  is  a  holding  cost  of  h  per  unit  time 
for  each  customer  waiting  for  service.  We  shall  make  two  alternative  assump¬ 
tions  with  regard  to  holding  costs  for  customers  during  the  service  period. 
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Under  one  assumption  holding  costs  for  a  customer  end  once  he  enters  service; 
under  the  other  assumption  there  is  a  holding  cost  of  h  per  unit  time  for 
each  customer  while  being  served.  If  the  former  of  these  assumptions  holds, 
we  shall  be  concerned  with  problem  1.  If  the  second  of  these  alternative 
assumptions  holds,  we  shall  be  concerned  with  problem  2.  We  seek  a  policy 
which  minimizes  the  expected  average  cost  per  unit  time  for  both  problems 
1  and  2. 
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S 


0 


R  + 


Ahy2 

2 


for  problem  2. 
Setting 


v±  =  Bi 

and  substituting  in  (3.1)  yields 


B  = 


S 


1 


For  problem  1 


For  problem  2 


Ahy 

-1  R  +  “2i’ 
♦f0  =  AiJ  h  + - ~ 

0  v 


It  is  interesting  to  note  that  for  problem  2  there  is  an  optimal  expected 
service  time 


y 


-l 


Ahy  n 

R  +  -ir 

Ah 


1 

2 


for  which 
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A 


* 


=  2 


Ah(R  + 


1/2 


If  y  ^  =  0,  we  consider  that  policy,  f^,  which  always  waits  for  the 
arrival  of  a  single  customer  before  providing  instantaneous  service.  In 
this  case  it  is  easily  shown  that 


LEMMA  3.2.  Assumption  A(2)  holds. 


Proof.  If  n  customers  arrive  during  time  t,  then  it  is  a  property  of 
the  Poisson  distribution  that  the  total  cost  associated  with  holding  these 
customers  in  hnt/2,  where  E(t)  =  n/A .  Now  let  ?r  be  any  policy  for  which 
the  probability  that  n  customers  are  allowed  to  arrive  between  the  conclu¬ 
sion  of  one  service  and  the  start  of  the  next  service  is  given  by  P^  for 
n  =  0,  1,  ...  .  If  service  is  never  provided,  we  set  P^  =  1.  Since  R  ^  0, 
it  can  only  add  to  the  average  cost  per  unit  time.  Neglecting  this  cost 
and  all  holding  costs  for  customers  who  arrive  while  a  service  is  in  progress. 


Cj)  > 
7T 


l  ? 


hn  n. 
n  2  A 


Ip  T  + 

L  n  X 


-1 


N-l 


V  p  hn  +  M  V  P  H 
L  c o  L  \ 


n=0 


n  2X 


hN 

2 


n=N 


n  X 


N-l  P  n  »  P  n  - 
v  n  ,  r  n  ,  — 1 

I  —+  I  ~r+  * 

n=0  A  n=N 
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=  lim 


N-l 

v 

P  n2h 
n 

z 

n=0 

2X 

,  hN 

k 

Y 

P  n 
n 

+  T 

z 

n=N 

X 

-i 

N-l 

+  I 

n=0 

Pnn 

P 

X 

- +  1 

«■  *  i 

l  4 


n=N 


If  p^,  the  mean  time  between  the  end  and  start  of  service,  is  then 

k  P  n 

Lim  j  — y-  =  <» 
k-x»  n=N  A 

and 

>  hN/2 

for  all  N,  showing  that  $  *  00 .  Lemma  3.1  excludes  such  policies  and  hence 
P^  <  00 •  Thus  for  any  permissible  policy  tt  the  mean  recurrence  time  between 
bulk  services  is  p  +  p  ^  <  00 . 

7T 

Each  time  bulk  service  is  performed  there  is  a  probability, 

-Xt 

/e  dB(t)  >  0,  that  the  state  0  will  be  entered.  Thus,  for  all  permissible 
policies  0  is  a  positive  recurrent  state  with  mean  recurrence  time  of 
-Xt 

(liff  +  y)/  e  dB(t). 

LEMMA  3.3.  Assumption  A(3)  holds. 
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C^(j)  and  t^(j)  and  computing  C^(j)  -  (J>^  t^(j).  We  summarize 
these  results  in  a  theorem. 

THEOREM  3.4. 

If  ^2  K  m>  there  exists  a  stationary  optimal  policy  for  bulk  queueihg 
problems  1  and  2. 

3.2.  Qualitative  Attributes  of  an  Optimal  Policy 

We  now  investigate  the  attributes  of  Vf*,  the  cost  function  associated 
with  an  optimal  policy  f*.  From  Reed  [7],  Theorem  4.1,  the  optimal  cost 
function  must  satisfy 

hi  ^f* 

Vf*(i)  =  min[Vf*(i+l)  +  ^  -  -j-  , 

00  ]£ 

l  Vf*(k)/e_U  dB(t)  +  S0  +  Sxi  -  (j)f *y_1] . 

k~  0 

LEMMA  3.5.  The  set  of  i  for  which 

k 

V^*(i)  =  2  V£*(k )/e  ,^t  dB(t)  +  Sq  +  S^i  -  ^ 

lc 

is  unbounded. 

Proof.  Suppose  the  contrary  is  true,  so  that  there  exists  an  n  such  that 
V^*(i)  =  - 1 — h  Vf*(i+1)  i  =  n,  n+1,  ... 
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Now  suppose  that  an  upper  bound  N  >  n  is  imposed  on  the  number  of  customers 
allowed  in  the  queue.  Let  <()^*(N)  be  the  average  cost  per  unit  time  for 
this  bounded  problem  under  the  policy  f  ,  where  For  i  =  N 

we  have 


V<  «im 


+  Vm"1' 


so  that 

<j>f  *(N)  =  hN. 

In  the  limit,  <f>f*(N)  -*■  00 ,  contradicting  the  assumption  that  <i>£*  <  00 

LEMMA  3.6.  The  set  of  i  for  which 

hi  ^f* 

V  &(i)  =  — r”  —  "■  ■  +  V_*(i+1) 

f  A  A  £ 

is  bounded. 


Proof.  Assume  the  contrary  is  true.  From  Lemma  3.5  it  follows  there  is 
an  unbounded  sequence  {i^}  such  that 
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Now  since  <J>r*  <  00 ,  there  exists  an  i.  such  that 
f  3 


Vf*(ij)  -  Vf*(i  +1)  >  0. 


Also 


Vf*(ij)  -  Vf*(i  +1)  <  -sx 

<  0, 

since  >_  0.  This  gives  a  contradiction. 

Since  the  possible  actions  in  each  state  are  (1)  perforin  service  immedi¬ 
ately  or  (2)  hold  customers  for  future  service,  determining  a  stationary  optimal 
policy  is  nothing  more  than  deciding  if  each  state  i  is  a  "service  state" 
or  a  "holding  state".  Lemma  3.5  merely  says  that  service  states  are  unbounded 
and  Lemma  3.6  says  that  holding  states  are  bounded.  We  may  restrict  our 
search  of  an  optimal  stationary  policy  to  policies  with  this  attribute. 

We  summarize  this  result  in  a  theorem. 

THEOREM  3.7. 

If  0  <  X  <  00  and  <  «>,  then  a  stationary  optimal  policy  exists 
and  is  characterized  by  having  its  set  of  holding  states  bounded. 

3.3.  Determination  of  an  Optimal  Policy 

We  now  show  that  an  optimal  improvement  policy  exists  which  is  a  special 
case  of  the  policies  described  in  Theorem  3.7.  Finally,  we  shall  show  that 
this  optimal  improvement  policy  is  optimal.  We  shall  consider  the  following 
policy: 
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Hold  customers  until  n  are  in  the  queue  and  then  start 
bulk  service.  If  there  are  n  or  more  customers  in  the  new 
queue  at  the  end  of  bulk  service,  begin  a  new  bulk  service.  If 
there  are  less  than  n  at  the  end  of  bulk  service,  wait  until 
there  are  n  and  then  begin  bulk  service.  We  shall  refer  to  the 
above  policy  as  a  monotone  policy  with  parameter  n, 


The  difference  equations  associated  with  this  policy  are 
1>. 


_  _ n  _  _hi 

vi  "  Vi+1  X  ~  X 


i  =  0,1,2, .. . ,n- 


k 


v,  -  I  v.  /  e  dB(t)  +  y'.%  =  SQ  +  S-i  i  =  n,n+l,..., 

1  k=0  K  0  n  u  ± 


where  0. 


In  this  case  we  attempt  to  find  a  solution  of  the  form 


v^  =  bi  +  ci  i  =  0,  1,  2,  . ..,  n-1,  and 


v.  =  K*  +  B*i 
i  n 


i,  =  n ,  n+1 ,  .  . . 


instituting  in  (3.3)  for  i  =  n,  n+1,  . we  have 


K*  +  B*i  -  l  (bk+ck2)  /  e"At  dB(t) 

n  k=0  0  k! 


n  f 

-  I  (K  +  B  k )  /  e 
k=n  n  0 


U  dB(t)  +  V~\  -  SQ  +  Sxi 


Seating  B  =  S^,  we  have 


1  (3.2) 

(3,3) 
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-1 


Ah 


Kn  P(n-1»  0)  +  ^>n(y  J'-I,(n-2,  1))  =  SQ  -  ~  P(n-3,  2)  +  S-jA^^-POi^l)) 

(3.4) 


where 


P(n,r)  =  / 
0 


n 

I 

k=0 


-At 


At 

k! 


:rdB(t) 


is  obtained  in  the  course  of  interchanging  summation  and  integration,  which 
is  permissible  from  the  properties  of  the  exponential  function  and  the  fact 
that  B  is  a  distribution  function. 

Substitution  for  i  =  n-1  in  (3.2)  yields 


n<j) 


n  *  _  hn(n-l)  „ 

A  Kn  "  2A - +  Sln- 


(3.5) 


Clearly,  equations  (3.4)  and  (3.5)  provide  two  linear  equations  in 
* 

^n  Provide  a  solution  for  the  average  cost  per  unit  time 

for  such  a  policy.  We  shal^.  consider  the  form  of  improvement  that  can  occur 
if  the  policy  improvement  algorithm  is  applied  to  a  monotone  policy  with 
parameter  n„ 

If  there  exists  i  >_  n  such  that 

vi  -  vi+i  +  IT  *  T- 


then  an  improvement  in  policy  will  be  obtained  by  starting  bulk  service  only 
* 

after  n  >  n  customers  have  arrived.  If,  on  the  other  hand,  there  exists 
i  <  n  such  that 

00  k 

vi  ■  J0  Vk  /e"U  dB<t>  +  *  so  +  Si1. 
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Then  there  will  be  a  set  S(n)  such  that  an  improvement  in  policy  will  be 

obtained  by  starting  service  for  all  i  e  S(n)  and  all  i  n. 

For  i  >  n,  v.  =  K  +  S,i,  so  the  first  condition  becomes  n  <_  1  < 

—  9  i  n  1 

($  -ASj)/h,  and  we  define 

W(n)  =  { i ;  n  £  i  <  (cf>n— AS^)  /h} . 
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policy  improvement  algorithm  to  a  monotone  policy. 


LEMMA  3.8.  If  S(n)  4  then  W(n)  =  $.  Thus,  if  W(n)  4  then  S(n) 


Proof.  Assume  there  exists  i^  ^  n  such  that 


ii< 


and  there  exists  i^  n-1  such  that 


(^-AS^  n+iQ-l 


Therefore, 


4>  -AS  n+in-l  <J>  -AS., 

n  1  0  ,  n  1 

“ h -  <  — 2 - 1  n-1  <  n  £  ix<  — - , 


a  contradiction. 


LEMMA  3.9.  Policy  improvement  of  a  monotone  policy  leads  to  a  monotone 
policy. 


Proof.  If  W(n)  4  we  define 
* 

n  =  maxli:  n  £  i  <  (^-AS^) /h }  +  1. 


the  monotone  policy  with  parameter  n  •>  ri+1  is  a  result  of  the  policy 
improvement. 
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If  S(n)  i  $,  then  for  all  i  t  S(n),  inequality  (3.6)  is  satisfied. 
We  observe  that  if  there  exists  i  <  n  for  which  (3.6)  is  satisfied,  then 
the  inequality  (3.6)  will  hold  for  i,  i+1,  ...,  n-1.  In  this  case,  if  we 

•fa 

define  n  =  min{i:  i  e  S(n)},  then  the  policy  improvement  algorithm  leads 

: k 

to  a  monotone  policy  with  parameter  n  . 


LEMMA  3.10.  If  n^  is  the  parameter  associated  with  the  iteration 

of  the  policy  improvement  algorithm  and  is  the  iteration  on  which 

W(n.  -)  4  $  for  the  i^  time,  then  W(n.  )  =  $  and  n..  <  n 


V1 


Ji-i 


n . 
Ji 


Proof . 

Part  (i)  W(n.  )  =  3>. 

Ji 

th 

The  x  iteration  leads  to 


max{k: 


n. 

1 


_iik 


(*n.  _L  -  ASp/h}  +  1, 
Ji 


from  which  it  follows  that 


(<f>n.  -  -  AS.)/h  <  n,  . 

Ji  Ji 


If  on  step  j .+1,  W(n.  )  ^  $,  then  there  would  exist  an  i^  such  that 
1  Ji 


n.  £  i^  <  (4>n .  -  AS^/h, 

^i  Ji 


which  implies 
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<f>n  ,  <  <j>n  , 


a  contradiction.  Hence  W(n.  )  =  $  and  either  S(n.  )  =  $  or  S(n.  )  ^  $ 

Ji  Ji  Ji 

In  the  former  case  the  policy  improvement  algorithm  has  converged.  In  the 

latter  case  the  policy  improvement  algorithm  goes  on. 

Part  (ii)  (n.  }  is  strictly  decreasing  in  i: 

Ji 


If  S(n.  )  #  <t>,  then  on  step  j.+l  there  exists  in  <  n. 

Ji  u  Ji 


such  that 


4>n  -  XSX  n.  +  iQ-l 


1  n.  _1> 

Ji 


so  that  n.  >  n .  . , .  If,  moreover,  S(n.  )  ^  $  for  m  =  l,2,...,r, 

we  obtain  in  the  same  way  a  sequence  n  >  n.  ->...>  n.  ,  stopping 

Ji  J i+I  3^  +r 

only  when  S(n^  +r+^)  *  #•  If  W(n^  +r+^)  =  $,  the  policy  improvement 
algorithm  has  converged.  If  W(n^  +r+-^)  ^  then 


max{k:  n 


,  <  k  <  (<pn  )  -  ASj/h}  +  1. 


<t>n.  .  -  AS  <J>n  -AS 

Ji+1  Ji  A 


i— K — <  “j  - 1- 


-j.  '  ASi 

Ji+1 

n.  <  - r - +  1  <  n.  -  1  +  1  *  n,  , 


61 


NWC  TP  5594 

n.  <  n.  . 

Ji+1  Ji 

THEOREM  3.11. 

The  Policy  Improvement  Algorithm  terminates  with  a  monotone,  0-optimal 
improvement  policy. 


Proof.  Application  of  the  policy  improvement  algorithm  gives  rise  to  a 
sequence  of  the  following  form: 


n .  >  n ,  , -  >  ...  >  n .  - 

i0  v1  Ji_1 


n  s  >  n ,  .  -|  ^  •  •  •  >  n  -i 
Ji+1  J?*"1 


n.  >  n ,  ,  -  >  . . .  >  n  -i 
J2  J2+1  J3"1 


n .  >11.  ,  -  >  ...  >  n  -j 

jk  jk+i 


k+1 


Since 

n.  >n.  >  . . .  >  n.  , 

J1  J2  Jk 

* 

this  sequence  must  stop  in  a  finite  number  of  steps  with  some  n  with 

W(n  )  =  S(n  )  =  4>.  Otherwise,  n.  -»■  -00  and  clearly,  n.  _>  0  for  all  j. 

1  J 

* 

The  resulting  policy  from  Lemma  3.9  is  monotone  with  parameter  n  . 


THEOREM  3.12. 

The  monotone,  0-optimal  improvement  policy  of  Theorem  3.11  is  optimal. 
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*  * 
Proof.  We  let  f  be  that  stationary  monotone  policy  with  parameter  n 

obtained  from  convergence  of  the  policy  improvement  algorithm.  If  S.^  »  0, 


d)  * 

/ 1  \  /  n  .  li  v  ,  h  ,  2  a  ^ 

V^*(x)  =  (— y  +  y)i  -  y  i  l  ■  0,  1,  2,  . . . ,  n  -1 


Vf*(i)  =  K  n* 


&  * 

i  =  n  ,  n  +1 ,  ...  . 


Clearly  the  V^*(i)  are  bounded  and  from  Reed  [7],  Corollary  4.9  the 
resulting  stationary  policy  is  optimal. 

If  >  0, 

Vf*(i)  =  *T"  +  2l ^  "  2A  1  1  =  °»  1»  •**»  n  _1 

Vf*(i)  =  K*n*  +  Sli  1  =  n*»  n*+1»  * 


In  this  case  the  V^*(i)  are  unbounded  and  from  Reed  [7],  Theorem  4.8 
a  sufficient  condition  for  optimality  is  x(f)Vf*  <  00  for  all  f 
Lemma  3.6  shows  that  the  Markov  matrix  of  all  feasible  stationary  policies 
f  will  be  of  the  form 
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poo^f) 

p01(« 

P0n 

pl0(f) 

pii  (» 

Pln 

PN-10^f^ 

•  •  •  P« 

Nn 

po 

P1 

P 

where 


It  follows  that 


P.  =  /  e”At  dB(t) 

1  n  1* 


N-l 

x .(f)  =  l  x  (f)P  Af)  +  P.  I  x  (f) 

1  j=0  3  J1  j=N  J 

There  are  only  two  possibilities  for  P^f)  for  j  =  0,  1, 
Either  P^Cf)  =  P±  for  all  i  or  P^Cf)  =  0  for  1  >  N> 

T  =  {j:  0  <  j  <  N-l  and  P.±(f)  =0  for  i  > 

J 


Now 


l 

+  p,  l 


X. (f)  =  l  x,(f)P,.(f)  +  p±  L_  x,(f) 
1  jeT  3  3%  jeT  J 


where  T  is  the  complement  of  T  on  {0,  1,  2,  ...}.  Now 


=  0,  1,  2, 

2,  ....  N-l 
Let 

N}. 
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X  xi<f)  V(±)  =  X  X  x,(f)  P1t(£)Vf*(i)  +  X  V  *(i)  l  x  (f) 

i=0  1  i=0  jeT  J  31  1  1=0  1  jeT  J 

N 

=  X  X  x  (f)p  v  *a)  +  x_  x4  (f )  X  v  *(i>. 

i=0  jeT  J  J1  1  ieT  J  i=0 


We  observe  that  V£*(i)  <  A  +  Bi  with  B  >  0  and 

i  — 

N 

X  X  x* Cf)p.±  v  *(D 

i=0  jeT  J  J1  1 

00 

is  clearly  bounded.  Also  X_x-(f)  is  bounded  and  X  x  (f)  V-*(i)  < 

jeT  3  i=0  1  * 

00 

if  and  only  if  X  PjV_*(i)  is  bounded,  and  this  expression  is  bounded 
i=0 


if  and  only  if  X  iP •  <  00 .  The  generating  function  associated  with  P. 

i=0  1  i 


K(z)  =  X  P.zJ  =  /  e  Xt+Xtz  dB(t), 
j=0  J  0 


with  mean 


K'(l)  =  /  XtdB(t)  =  Xp 
0 


We  have  assumed  0  <  X  <  °°  and  y  ^  <  ",  so  Xy  ^  <  “  and  the  sufficient 


condition  for  optimality  is  satisfied. 
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4.  The  M/M/1  Queue  with  Variable  Service  Rate 

Consider  an  M/M/1  queue  that  is  controlled  by  selecting  one  of  two 
service  rates.  Customers  arrive  according  to  a  Poisson  process  with 
rate  X  >  0.  Service  times  are  non-negative,  independent  random  variables 
with  an  exponential  distribution  with  service  rate  or  with 

y  <  v2  <  °°  ’  We  assume  that 


There  is  a  cost  r^  per  unit  time  of  operating  the  queueing  system  with 

service  rate  y  ,  i  -  1,  2.  It  is  assumed  If  i  customers  are 

* 

in  the  queueing  system,  there  is  a  holding  cost  of  where  c  <  cQ  <  Cy  •  *  > 

c  00  as  i  «>  and  c*  >  -°°  is  a  lower  bound  on  the  holding  cost  rate, 
i 

Decisions  are  made  at  the  time  of  service  completion  or  the  arrival  of 
a  customer  in  the  system.  We  associate  k  =  1  with  the  decision  to  use 
service  rate  y^  and  we  associate  k  =  2  with  the  decision  to  use  service 

rate 

The  state  of  the  system  is  described  by  the  pair  (i»j)  where  i 
indicates  the  number  of  customers  in  the  queue  and  j  implies  service 
rate  y^  is  in  use.  As  in  Section  2,  i  is  right  continuous  and  j  is 
left  continuous  with  respect  to  the  time  parameter. 

The  optimization  criterion  is  minimum  expected  average  cost  per  unit 

time. 

4.1.  Existence  of  a  Stationary  Optimal  Policy: 

We  now  proceed  to  verify  assumptions  A(l) ,  A(2)  ,  and  A(3)  of  Reed  [6] 


66 


NWC  TP  5594 


Section  4.1  for  the  existence  of  a  stationary  optimal  policy.  It  should  be 

noted  that  in  the  two  preceding  examples  we  made  an  explicit  assumption  about 

the  second  moment  of  the  general  service  distribution,  to  assure  that  costs 

over  a  service  period  are  finite.  In  this  particular  example  we  must  assume 

that  the  c.  are  such  that  there  exists  a  policy  7rn  for  which  d>  <  °°. 

i  0  % 

We  shall  first  prove  a  lemma  giving  a  sufficient  condition  that  A(l)  be 

satisfied. 


LEMMA  4.1.  If  £  p^Cj<”»  A(l)  is  satisfied 

i=0  1 


Proof.  Let  f q  be  the  stationary  policy  which  always  uses  fast  service 
From  Reed  [7]  Section  2.2  we  have 


xi  (ci+r2)/(X+y2)  +  xQ  (c0+r2)A 
00 

X  x. /(A+y,)  +  Xn/A 
i=l 


where 


X.  =  X.  +  — —  X.  ^ 

x  A+y  l+l  A+y  l-l 


i  =  1,  2,  . 


X0  “  A+y  X1 


It  follows  that 


X±  =  p1_1(l-p2)/2 


i  -  1»  2,  . 


xn  =  (l-p)/2 


and 
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r2  + 


i=0 


(1-P2)P2 


c. 

l 


<  <». 


LEMMA  4.2.  If  A(l)  holds,  A(2)  is  satisfied. 

Proof.  A(l)  implies  the  existence  of  a  policy  fp  for  which  <  °°. 

Now  we  must  show  there  exists  a  state  recurrent  over  all  policies  it  for 
which  ^  ^  We  define 


I  =  { (i ,  1)  :  c.  +  I..  <,  <f>f  }  =  {(0,1)  ...  (i.,1)} 

1  1  1  x0 

I  =  { (i ,2)  :  c.  +  r„  £  <j>f  }  =  {(0,2)  ...  (i2>2)}  . 

z  0 

Regardless  of  the  number  in  the  queue  the  probability  of  entering  U  I2 
while  a  service  is  being  performed  exceeds 


P  = 


y2  X+y2 


11+1 


Hence  1^  U  ^2  recurrent  with  expected  transition  time  from 
less  than 

y  X+y2  11+1 
yl 

In  going  from  1^  I2  to  1^  U  I2  one  of  the  states  {(i^,l),  (i^-1,1)... 
(i2,l),  (i2,2)},  must  be  entered.  There  are  ij-i2  +  2  of  these  states. 
Since  this  finite  set  is  recurrent,  at  least  one  of  these  states  must  be 
positive  recurrent  under  any  policy  tt  for  which  <j>£  •  Let  s  be 
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any  of  these  states.  Once  s  is  entered  the  probability  of  bringing  the 
queue  to  zero  before  the  arrival  of  a  single  customer  is  at  least 
(y^/(X+y^)  .  The  process  is  in  either  (0,1)  or  (0,2)  when  this  occurs. 
Since  Ti  <  r 2*  t^ie  avera8e  cost  Per  unit  time  may  always  be  reduced  by 
making  decision  k  =  1  in  (0,2).  This  restriction  on  the  class  of 
permissible  policies  in  no  way  eliminates  a  possible  optimal  policy  and 
we  may  assume  that  whenever  the  queue  is  empty,  the  process  is  in  state 
(0,1).  It  follows  that  the  event,  "Enter  s  and  reach  (0,1)  before 
the  arrival  of  single  customer,"  is  positive  recurrent  for  all  permissible 
tt.  Clearly,  the  state  (0,1)  is  positive  recurrent  for  all  tt. 

LEMMA  4.3.  Assumption  A(3)  holds. 


Proof.  We  use  Reed  [7]  Theorem  4.3  and  Appendix  A.  We  have 


°(i,l) 


(1) 


ci+r 


X+p 


1 

1 


C(i,D 

°(i,2) 

C(i,2) 

t(i,l) 

t(i>l) 

t(i,2) 

t(i,2) 


(2) 

(2) 

(1) 

(1) 

(2) 

(2) 

(1) 


=  0 


c^+r 

X+p 


2 

2 


=  0 


1 

x+p1 


=  0 


1 

X+P2 


=  0. 
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The  exclusion  of  trivial  sequences  with  non-negative  costs  leads 
to  the  verification  of  Condition  1  and  justifies  the  inclusion  of 


* 


t(±  1)(2)  =  t(i,2)(1)  =  °*  Since  ci  with  ci  <  ci+l  ^  ci  -  c 

the  set  S  of  Reed  [7]  Theorem  4.3  is  finite.  It  follows  that  V  is 

bounded  below. 


We  summarize  these  results  in  a  theorem. 


THEOREM  4.4. 

if  I  p2  c±  <  "» 


then  there  exists  a  stationary  optimal  policy. 


Crabill  [3]  Chapter  VI  investigates  the  K  service  rate  problem. 

For  this  problem  we  have  service  rates  <  p2  <  <  corresponding 

operating  costs  r^  <  ^  <  •••  <  If  Pj,  =  <  ^ »  one  ma^  s^ow  that 

K 

a  stationary  optimal  policy  exists  by  proving  lemmas  corresponding  to 
Lemmas  4.1,  4.2,  and  4.3  in  essentially  the  same  way.  One  may  also 
include  rewards  for  completed  service. 


4.2.  Qualitative  Attributes  of  an  Optimal  Policy 

We  now  investigate  properties  of  V^*,  an  optimal  cost  function, 
and  f*,  an  optimal  stationary  policy.  The  cost  function  V^*  must 
satisfy 

vf*(i.i)  ■  i-t  Vf*(i+I,i)  +  ^  vf»(i-i,i)  -  ^ 


+ 


ci**"rl 


Vf*(i,2)] 
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<P  r 

Vf*(0,l)  =  min[Vf*(l,l)  -  —  +  —t  Vf*(0,2)] 


Vf*(i,2) 


=  m±nI  V(1+1-2>  + 


♦f*  ci+r2 

^  +  "THy  V(i>^ 


Vf*(0,2) 


♦f*  .  r2 


=  min[Vf*(l,2)  =-±-  +  -±t  V  *(0,1)1. 


Since  Vf*(i,l)  <  Vf*(i,2)  <.Vf*(i,l),  Vf*(i,l)  =  Vf*(i,2)  and  the 
above  functional  equations  may  be  rewritten  as 


Vf*(i) 


"to  f  x TT  V(i+1)  +  aS-  V<i-M> 

k=l,2  Ayk  *  A+)Jk  f 


ci+rk 

x+pk 


] 


V  *(0)  =  min  [V*(l)--^  +  i. 
k=l,2  1  A  A 


1,2, 


We  see  that  if  the  minimum  is  attained  for  k  =  1,  then  f*(i)  =  1. 
Any  i  for  which  this  is  true  will  be  called  a  slow  service  point.  If 
the  minimum  is  attained  for  k  =  2,  then  f  (i)  =  2.  Any  i  for  which 
this  is  true  will  be  called  a  fast  service  point.  The  determination  of 
an  optimal  policy  is  equivalent  to  optimally  classifying  each  i  as 
slow  or  fast.  We  shall  agree  that  if  for  some  i  the  minimum  is  attained 
for  both  k  =  1  and  k  -  2,  we  shall  set  f*(i)  =  1. 

We  now  prove  a  number  of  lemmas  which  correspond  to  those  of  Crabill 
[3],  Chapter  II.  The  difference  is  that  there  is  never  any  reference  to 
truncated  problems.  The  lemma  of  Crabill's  which  most  nearly  corresponds 
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to  each  of  the  following  lemmas  will  be  noted.  It  whould  also  be  pointed 
out  that  the  inequality  manipulations  are  much  the  same  as  Crabill' s. 


LEMMA  4.5.  [Crabill  Lemma  1]:  f  (0)  =  1. 


Proof*  Since  <  r ^ , 

♦f*  ri 

V(1>  -  -r  +  t  <  V(1) 


2  1 

LEMMA  4.6.  [Crabill  Lemma  2]:  Letting  R  =  » 

y2  yl 

Vf*(i)  -  Vf*(i-1)  >  R  ~  f*(i)  =  2 

i  =  1,2,... 

Vf*(i)  -  Vf*  (i-1)  1  R  ^  f*(i)  =  1 

i  =  1,2, ..  . 

Proof.  This  follows  immediately  from  the  functional  equation  defining 
an  optimal  policy. 


LEMMA  4.7.  [Crabill  Lemma  3]:  The  set  of  i  for  which  f  (i)  =  2  is 
unbounded. 

Proof.  Assume  the  contrary;  then  there  exists  an  N  such  that  for  all 
i  >_  N,  f*(i)  =  1  and 

P1(Vf*(i)  -  vf*(i-l))  =  X(Vf*(i+l)  -  vf*(i)>  -  <Pf*  +  C±  +  rv 
From  Lemma  4.6 
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Proof.  The  proof  of  this  lemma  is  virtually  identical  to  Crabillfs  Lemma  4. 
Assume  f  (i)  »  2  and  f*(i+l)  =  1. 

f*(i)  =  2  -»  Vf*(i)  -  Vf*(i-1)  >  R 

F*(i+1)  =  1  Vf*(i+1)  -  Vf*(i)  <  R. 

For  i  we  have 

♦f*  =  c±  +  r2-y2(vf*(i)  -  Vf*(i-1))  +  x(vf*(i+i)  -  vf*(i)) 

and 

4>f*  <  c±  +  r2  -  (y2-A)R. 
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For  i+1  we  have 


♦f* 


Ci+1  +  rr^i(vf*(i+1) "  Vf*(i))  +  x(vf*(i+2) "  Vf*(i+1) 


and 


<J>f*  j>  ci+1  +  r^-y^R  +  A(Vf*(i+2)  -  Vf*(i+1)). 


It  follows  that 


A(Vf*(±+2)  -  V£*(i+1)  <  AR  -  (c±+1  -  c±) 


and 


Vf*(i+2)  -  Vf*(i+1)  <  R 
=*•  f*(i+2)  =  1. 


For  i+2  we  have 

V  =  c±+2  +  rrpi(vf*(i+2)  '  V(i+1))  +  *(vf*(i+3)  '  vf*(i+2)) 


and 


<)i£*  >_  ci+2  +  -  HjR  +  A(V£*(i+3)  -  V£*(i+2)). 


It  follows  that 


A(Vf*(i+3)  -  Vf*(i+2))  <  <frf*  -  ci+2  -  rx  +  y^ 

^  AR  ^ci+2  _  ci^ * 
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This  implies 


f*(i+3)  =  1. 


Proceeding  in  this  same  way,  f*(i+k)  =  1  for  K  =  2,  3,  4 . This 

result  contradicts  Lemma  4.7. 


We  are  now  in  a  position  to  state  a  stronger  theorem  than  that  of 
Crabill. 


THEOREM  4.9.  [Crabill  Theorem  1] 

00 

If  £  P9c.  <  °°,  then  a  stationary  optimal  policy  f*  exists  and  is 
i=0  1 

characterized  by  a  single  positive  finite  integer  N,  such  that  f*(i)  *  1 
for  i  <  N  and  f*(i)  =  2  for  i  ^  N. 

Proof.  The  proof  follows  as  a  consequence  of  Theorem  4.4,  Lemma  4.5,  and 
Lemma  4.8f 

The  approach  to  the  control  of  the  M/M/1  queue  with  variable  service 
rate  presented  here  is  formulated  as  a  semi-Markov  decision  process, 
whereas  Crabill  formulated  the  problem  as  a  continuous  time  Markov  decision 
process.  The  semi-Markov  formulation  allows  for  instantaneous  changes 
in  state  in  a  natural  way,  whereas  instantaneous  changes  in  state  present 
some  conceptual  problems  in  the  continuous  time  formulation. 

Since  this  result  shows  that  the  optimal  stationary  policy  is  optimal 
over  all  admissible  policies,  the  remarks  following  Theorem  4.4  show 
that  Crabill1 s  optimal  stationary  policy  for  the  K  service  rate  problem 
is  an  optimal  policy.  The  ideas  presented  in  this  section  may  be  combined 
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with  those  of  Crabill  to  extend  the  results  to  K  service  rates.  In 
particular,  proofs  that  do  not  depend  on  queue  truncation  may  be  made. 


4.3.  Quantitative  Results  for  the  Linear  Holding  Cost  Case 

In  the  linear  holding  cost  case  we  have  ^  =  hi  where  h  is  the 
cost  per  unit  time  of  holding  a  customer  in  the  queueing  system.  Based 
on  the  preceding  qualitative  results,  we  are  led  to  the  following  system 
of  difference  equations  to  obtain  relative  costs  and  4>N ,  the  expected 
average  cost  per  unit  time  associated  with  a  policy  of  the  form  described 
in  Theorem  4.9: 


XVq  _  XVi  =  rx  -  *N 


(X+P1)vi  -  y1v±_1  -  Xvi+1  =  hi+rx  -  <f>N»  0  <  i  <  N 


U+y2)v.  -  y2v._1  -  Av±+1  =  hi+r£  -  ^ 


i  >  N 


For  i  >  N  we  set 


vi  =  A2  +  B2i  +  G2i 


For  i  <  N  we  make  use  of  the  homogenous  solution  and  have 

2  Pli 

v±  =  Bji  +  CjL  +  K[(— )  -  1], 


which  makes  Vq  =  0.  One  easily  finds  that 


h(y  +X)  r  -<fr 

B, - i i  =  1,  2 


1  2(p.-x)2  yi  x 
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Ci  =  2i^X)  1  "  2* 

The  above  system  must  satisfy  the  boundary  conditions 

2  2  ^ 

A2  +  B2N  +  C21T  =  B^N  +  CjlT  +  K  [(-f)  -  1] 

A2  +  B2(N-1)  +  C2(N-1)2  =  B1(N-1)2  +  C^(N-l)2  +  K[(^)N  ^l]. 


Finally, 


=  XV1  +  rl 


«  +  Cx)  +  XK[(-y)  -  1]  +  r1 


Elimination  of  A2  in  the  boundary  condition  equation  and  substitution 
for  B^,  Cp  B2,  and  C2  gives 


K(u2-X)  (y,-X)2  y.  N  1  pji  (ry-r.,)  (y,-X) 

<t>H - hr.  -A -  (rh  =  rh~  +  r„  +  tV  +  h(N-l)  -  •  2  1  2 


N 


X(y2-yi>  x 


l-p2  2  1-Pl 


(y2-yi) 


Substitution  for  B^  +  in  the  expression  for  <j>^  yields 


K(y  -X)*  P,h 

y,  ~  l-p,  +  rl 


where  p 


1  R 


Elimination  of  K  gives 


N 


pih 

——  +  r,  + 

1-p,  1 


P2h 

T—  +  r.  +  hN  + 
l~Po  2 


rl(yi~X)  -  r2(y2~X) 


(y2-yi) 


(u9-x)  |i- 
1  -  -r± - H-h 


N 


(y^yj)  x 
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One  may  verify  that 


*0  =  I^+r2 


(Always  use  y ^ ) 


*  =  37—+  P2r2  +  (1_p2)rl  (Use  P1  °nly  f°r  1  "  0) 


and  providing  A  <  y^. 


p.h 

-  +  T, 

1-P1  1 


(Always  use  y, ) . 


Setting 


<t>_  +  hN  + 


r1(y1-A)  -  r2(y2-A) 


^2  -  yl 


(y2-X)  y, 

(y^yp 


♦(n)  -  ♦  -  G(N). 


It  is  conjectured  in  Crabill  [3]  that  <j>  is  a  convex  function  of  N. 
We  now  show  that  <j>  is  not  convex  in  N.  A  necessary  and  sufficient 
condition  for  $  to  be  convex  in  N  is  that 


d2G(N) 


for  all  N. 


Consider  the  case  A  <  y^.  For  convenience  we  write 


A  ±  hN 

k^-1 
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where 


k^  =  (l^-D/C^-^)  >  1 


6  =  T  >  1 


A  -  + 


rl(Ul"A)  -  r2(y2-A) 


then 


G'(N)  = 


(k^-Dh  -  (A+hN)kieNlog  6 

s/v" 


and 


h  -  G(N)k1gN  log  3 

(k^-l) 


G"  (N)  =  -(k^  -1)[ 


N  G(N)k^3^(log  6) 2  +  kxeN  log  3G’(N) 


(h  -  G(N)k^3^  log  3)^3**  log  6 

S/V 


which  after  some  manipulation  reduces  to 


.N 


k,3  log  6 

G"(N)  =  -^—r: - r-  [-2h  +  (A+hN)  log 

0^3  -l/ 


U1  (l+k^1*) 

A  (-l+kjB*1) 


79 


NWC  TP  5594 


Since  A  +  hN  >  0  for  N  sufficiently  large 

k.gN  log  8  y-, 

6" 00  >  s - —  [~2h  +  log  y-  (A  +  hN)] 

(k1e  -i) 

for  N  sufficiently  large.  Also  for  N  sufficiently  large  the  expression  in 
brackets  is  positive,  so  4>(N)  is  not  convex. 

However,  (f>(N)  is.  unimodal,  since  G(N)  is  unimodal.  To  show  this  when 
A  <  y  consider  G(N) .  Since  A  +  hN  >  0  for  N  sufficiently  large, 

G(N)  >  0  for  N  sufficiently  large. 

Moreover, 


Lim  G(N)  =  0, 

N-*» 

showing  that  G  is  decreasing  for  sufficiently  large  N.  Since 
G(l)  -  G(0)  =  <K0)  -  <KD  >  0, 

G  is  increasing  for  some  positive  N  and  hence  G  has  at  least  one 
relative  maximum  for  positive  values  of  N. 

A  necessary  condition  for  a  relative  maximum  is  that  G' (N)  =  0. 

Thus  we  are  interested  in  N  for  which 

(k  gN-l)h 

A  +  hN  =  — == - . 

kjT  log  g 

We  observe  that  the  expression  on  the  right  as  a  function  of  N  is  non¬ 
negative,  increasing  and  strictly  concave.  At  N  =  0  this  function  has 

value  (1  -  1/k.,),  h— -  and  as  N  -*■  »  the  function  approaches  h/log  g. 
i  log  p 

The  derivative  at  N  =  0  is  h/k^.  The  expression  on  the  left  is  linear 
in  N  with  value  A  at  N  =  0  and  derivative  h  at  N  =  0.  Since  G 
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has  at  least  one  relative  maximum  for  positive  N, 


(1  -  1/k1)ISiT 


A. 


If  this  were  not  the  case,  there  would  be  no  positive  N  satisfying  the 
condition  G' (n)  =  0.  This  follows  from  the  concavity  of  the  expression 
on  the  right  and  the  fact  that  slopes  at  zero  are  such  that 


h/k1  <  h. 

We  may  plot  both  expressions  for  N  to  graphically  solve  for  minimizing 
values  of  N.  A  typical  plot  is  given  in  Figure  1. 


From  the  concavity  of  the  expression  on  the  right  and  the  linearity 
of  the  expression  on  the  left  the  N  for  which  G'(N)  =  0  is  unique,  so 
G(N)  h^s  exactly  one  relative  maximum  for  positive  N. 

The  value  of  N  which  maximizes  G  may  be  obtained  by  successive 
approximation.  We  begin  by  solving 
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A  +  hxQ  =  (1  -  l/k1) 

(1^3  ±-1-l)U 

A  +  hxi  =  - — - '  * 

kxg  i"1  log  3 

A  graphical  interpretation  of  this  procedure  is  given  in  Figure  2. 


Figure  2.  Successive  approximation  solution  for  optimal  N. 

If  <_  A,  unimodality  may  be  established  similarly  but  the  method 

of  successive  approximations  will  diverge,  so  one  must  solve  the  trans¬ 
cendental  equation  associated  with  a  relative  minimum  differently  or 
use  a  search  technique  to  minimize  <J>N  directly.  Since  <f>N  is  unimodal, 
standard  procedures  may  be  used.  Another  possibility  is  to  use  the 
policy  improvement  algorithm.  With  this  algorithm,  if  one  begins  with 
a  policy  such  that  f(i)  =  2  for  i  >.  N  and  f(i)  =  1  for  i  <  N,  then 
all  improvements  will  be  of  this  form  and  the  method  converges  to  the  optimal 

value  of  N . 
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Appendix  A 

FUNCTIONAL  EQUATIONS  OF  OPTIMALITY 


Introduction 

This  appendix  is  written  to  display  the  functional  equations  of 
optimality  for  special  cases  that  arise  in  the  queueing  applications  in 
this  report.  We  recall  that  a  stationary  semi-Markov  process  starts  at 
time  0  in  state  i  -  0,1,...  with  probability  P±.  With  the  observation 
of  state  i  an  action  k  18  1,2,...,  or  K  is  taken.  The  next  state,  j,  of 
the  process  occurs  with  probability  P^ (k) .  Conditional  on  the  event 
that  the  next  state  is  j,  the  time  until  the  transition  from  i  to  j 
occurs  is  a  random  variable  with  distribution  function  F^(*|k).  With 
the  observation  of  state  j  an  action  k  =  1,2,...K  is  taken  and  this 
procedure  goes  on  indefinitely.  Whenever  the  process  is  in  state  i  and 
action  k  is  taken,  a  cost  is  incurred  which  depends  on  random  events 
occurring  during  the  transition  interval. 

We  are  interested  in  two  optimization  criteria,  minimum  expected 
average  cost  per  unit  time  and  minimum  expected  total  discounted  cost. 
For  the  average  cost  criterion  the  functional  equation  of  optimality  is 
given  by 

Vf*U>  -  min  {C±(k)  -  *f*t±(k)  +  2  P  (k)Vf*(j)} 

where  C^k)  is  the  expected  cost  of  a  transition  and  t^k)  is  the 
expected  transition  time  when  action  k  is  taken  in  state  i.  The 
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expected  average  cost  per  unit  time,  is  obtained  from  V^A(0)  -  0. 

The  functional  equation  of  optimality  for  the  discounted  cost  criterion 
is  given  by 

V6(i)  -  min  {Ci(k)  +  j0V4‘BtVe«>dVt|k>) 

where  C^(k)  is  the  expected  discounted  cost  of  a  transition  when  action  k 
is  taken  in  state  i,  and  the  discount  factor  3  implies  a  cost  C  at  time  t 
contributes  Ce-^  to  the  total  discounted  cost. 


Functional  Equations  in  a  Special  Case 

In  this  section  we  consider  the  form  of  the  above  functional  equations 
for  a  special  case  that  arises  in  the  applications.  First,  we  assume  that 

F1(t|k)  =  S  P^WF^Ulk) 

is  basic  in  the  problem  formulation.  We  also  assume  that  if  action  k  is 
taken  in  state  i  and  the  transition  time  is  t  then  j,  the  next  state  of  the 
process,  is  given  by 


j  =  i  +  m  +  s 

where  m  has  a  Poisson  distribution  with  parameter  X^kjt,  and  s  is  inde¬ 
pendent  of  m  and  t  with  finite  distribution  Pi(s|k).  It  is  understood 
that  s  may  take  on  negative  as  well  as  positive  integer  values.  We  set 


p(n;X)  « 


Xn 

n! 


-X 

e 


and  our  problem  is  to  express  P^  (k)  and  F.y(t|k)  in  terms  of  F^(t|k), 
P^(s|k)  and  p(n;X^(k)t). 
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For  fixed  t 

(^1  *•)  "  2  Pi(s+v|k)p(m-v;X±(k)t) 

and 

00 

*,,00  -  2  P  (s+v|k)/p(m-v;X  <k)t)dF,(t|k) 

13  V  1  0  1  i 

The  joint  event,  the  next  state  is  j  and  the  transition  time  is  less  than  t 

given  state  i  and  decision  k  has  probability 

t 

PU.and  T  <  t]  =  2  P1(s+v|k)^p(m-v;Xi(k)x)dFi(x|k) 

It  follows  that 

2  P  (s+v|k)/p(m-v;X. (k)x)dF  (xlk) 

F  (t|k)  - g - i - i - 

2  P^Cs+v jk)/p(m-v;Xi(k)x)dFi(x|k) 

For  the  average  cost  criterion  the  functional  equation  of  optimality 
becomes 

Vf*(i)  “  min  {(^00  -  <f>fa%t±(k) 
k 

+  222  p  (s+v|k)Vf .  (i+m+s)/p(m-v;X.  (k)t)dF.  (t|k)} 
smv  1  r  0  1  i 

For  the  discounted  cost  criterion  we  have 

Vft(i)  »  min  {C  (k) 

P  k  1 

+  222  P  (s+v|k)Vp(i4mfs)/e  ^pfa-vjX.  (k)t)dF  (t|k)} 
smv  1  P  0  1  1 
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If  a  particular  action  k  is  taken  in  state  i  then  F^t  |k) ,  ^(s  |k) , 
and  A^k)  will  be  determined  for  a  given  i  and  k.  This  consideration 
allows  one  to  write  in  more  detail  the  entries  for  particular  i  and  k  on 
the  right  hand  sides  of  the  functional  equations  above.  There  are 
essentially  two  classes  of  actions  used  in  this  report. 

If  action  k  belongs  to  class  1  and  it  is  taken  in  state  i  then  the 
transition  time  is  exponential  and  the  transition  rate  from  state  i  to 
state  j  is  a±j(k).  In  this  case 

-ou(k)t 

Fi(t|k)  =  1  -  e 

where 


a±(k)  =  S  a±j(k) 

For  actions  of  class  1  it  is  further  assumed  that  A^k)  =  0.  It  follows 
that  in  the  equation  j  =  i  +  m  +  s,m  =  0  with  probability  1,  and 


(j“i|k) 


ot.,  (k) 

=  H.-.—  =  P,  .  (k) 

o±(k)  ij 
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For  the  discounted  cost  criterion  in  this  case. 


00 


222  P£(s+v|k)Vft(i+m+s)/e""^tp (m-V;X.  (k)t)dF.  (t  Ik) 
smv  p  0  1  i 


a.,(k) 
2  iJ 


“-fit  "a<(k)t 

VQ(j)/e  PV(k)e  1  dt 


j  ^(k)  P  o  i 
J  a . (k)  +  6  VJ) 


If  action  k  belongs  to  class  2  and  it  is  taken  in  state  i  then  Fi(t|k) 
is  general,  X^(k)  >  0,  and  there  exists  an  s^  such  that  P^Cs^lk)  =1.  For 
example,  Sq  *=  -1  and  Sq  =  -i  are  important  special  cases.  We  have  for  the 
average  cost  criterion  in  this  case 


t±(k)  ■=  /tdFi(t|k)  =  y^-(k) 
and 


CD 

222  p  (s4v|k)V  (i4m+s)/p(m-v;X  (k)t)dF.  (t  Ik) 
smv  1  1  0  1  1 

00 

“  2  Vfjt(i+m+s0)/p(m;Xi(k)t)dFi(t|k) 


For  the  discounted  cost  criterion  in  this  case, 

00  ~ 

222  P  (s+v|k)VR(i-hn+s)/e  ^pGn-VjX.  (k)t)dF  (tlk) 
smv1  P  0  1  I 


■  ^  VB(i-ha+s0)/e  ^tp(m;X1(k)t)dFi(t|k) 


87 


NWC  TP  5594 


If,  moreover, 

F^(t|k)  =0  t  <  0 

Fi(t|k)  =1  t  >  0 

then  the  right  hand  side  above  is  simply  Vf*(i+sQ)  for  the  average  cost 
criterion  and  V^i+Sg)  for  the  discounted  cost  criterion. 

Expected  Transition  Costs 

To  complete  the  specification  of  the  functional  equations  of  optimality 
we  must  compute  C±(k),  the  expected  cost  of  a  transition  if  action  k  is  taken 
in  state  i.  For  the  discounted  cost  criterion  this  calculation  is  required 
for  all  i  and  k  assuming  that  action  k  is  taken  at  time  0.  Since  the  calcu¬ 
lation  for  the  average  cost  criterion  is  the  same  regardless  of  the  time  of 
the  action  it  is  convenient  to  always  think  of  action  k  taken  in  state  i  at 

time  0. 

We  now  derive  C^k)  when  the  cost  structure  is  linear.  If  at  time  0 
the  process  is  in  i  and  action  lc  is  taken,  an  instantaneous  cost  C±k  is 
incurred  and  costs  start  to  accrue  at  a  rate  of  rfc  +  t^i.  Costs  accrue  at 
this  rate  until  either  the  transition  ends  or  the  occurrence  of  a  random 
event  at  time  If  occurs  before  the  end  of  the  transition  then 

beginning  at  time  ^  costs  accrue  at  a  rate  of  rfc  +  Thus  if  m 

random  events  occur  at  times  0  <  Tj  <  T2  .. .  <  Tm  <  t  during  the  transition 
time  interval  (0,t),  the  cost  rate  over  time  interval  (Tn»Tn+j_)  is 
rk  +  h^i+n)  for  n  =  0,1,... m  where  TQ  =  0  and  Tm+1  -  t.  We  assume  that 

TrT2,...  are  generated  according  to  a  Poisson  process  with  parameter  X±(k). 
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It  follows  that  for  a.  transition  time  t  the  number  of  random  events  Occurring 
at  times  <  T2  <  •  <  t  has  a  Poisson  distribution  with  parameter  X^kjt. 

We  recall  that  if  action  k  is  taken  in  state  i  and  the  transition  time 
is  t  then  j,  the  next  state  of  the  process,  satisfies 

j  «  i  +  m  +  s 


where  m  has  a  Poisson  distribution  wich  parameter  X^GOt.  Whereas  only  the 
number  of  these  events  is  important  in  determining  the  next  state  of  the 
process,  both  the  number  and  times  of  occurrence  of  these  events  determine 
transition  costs. 

For  the  average  cost  criterion  the  expected  transition  cost  if  action 

k  is  taken  in  state  i,  given  T.,,T0,...T  ,  m,  and  t  is 

x  z  m 

C±(k! » ^2 9  *  *  * m 9 ^ ^ 

*  c±k  +  Jo<Vi  -  V<rk  +  Vi4n» 

■  °ik  +  <rk  +  V>‘  +  V*  -  \  Jji 

Similarly  for  the  discounted  cost  criterion 


m 


n+1 


UTl  Q 

Cik  +  ny*k  +  e  ysds 


t  n  m  n+1  a 

Cik  +  (rk  +  \i)fQe~  Sds  +  e"  Sds 

n 


-cik  + 


(<rk  +  V1'1  *  *'Bt>  -  "V'®'  -  »k  j/”1) 


/& 
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V12(k)  -  /t2dF±(t|k)  . 
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For  the  discounted  cost  criterion 


CM  +  O  »_ 

ci(k)  “  Cik  + - T^"  (1  "  f0e  dFi^lk» 


^4  (k)\  08  pi.  "O  Aj, 

+ - a  -  /e  ^tdF±(t|k)  -  e/te“etdFi(tlk)) 


For  actions  in  class  1  where  X^(k)  =  0  and  F^(t|k)  «  1  -  e  * 
the  expected  transition  cost  for  the  average  cost  criterion  is 

<rk  +  M) 

n  /t_  \  /-I  «  *v  K 


-a,  (k)t 


ci(k)  -  Clk  +  a^(k) 


For  the  discounted  cost  criterion  in  this  case 

(rk  +  V> 

Ci(k)  "  cik  +  a±(k)  +  6 


For  actions  in  class  2  when  A^k)  >  0  there  is  no  special  reduction 
of  the  above  formulae.  If,  however. 


Fi(t|k)  =  0 


t  <  0 


t  >  0 


ci00  -  cik 

for  both  the  average  cost  and  discounted  cost  criterion. 
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Appendix  B 
GLOSSARY 

This  glossary  summarizes  the  notation  and  key  results  of  Reed  [7]. 

Section  A  provides  basic  concepts.  Section  B  is  concerned  with  the  case 
when  the  optimization  criterion  is  minimum  expected  discounted  costs. 

Section  C  is  concerned  with  the  case  when  the  optimization  criterion  is 
minimum  expected  average  cost  per  unit  time.  All  theorem  numbers  refer 
to  theorems  in  Reed  [7]. 

A.  Basic  Concepts  and  Notation 

Semi-Markov  decision  process:  A  sequential  decision  process  associated 
with  a  semi-Markov  process  which  starts  at  time  0  in  one  of  the  states 

i  =  0,  1,  2 .  With  the  observation  of  state  i,  an  action  k  =  1,  2,  ...» 

or  K  is  taken.  The  next  state,  j,  of  the  process  occurs  with  probability 
j (k) .  Conditional  on  the  event  that  the  next  state  is  j,  the  time  until 
the  transition  from  i  to  j  occurs  is  a  random  variable  with  distribution 
f  traction  F_^  (• |k) .  With  the  observation  of  state  j,  an  action  k  =  1,  2,..., 
or  K  is  taken,  and  this  iterative  procedure  goes  on  indefinitely.  The 
transition  time  distribution  when  action  k  is  taken  in  state  i  is  given  by 

F  (t|k)  =  l  P  (k)  F  (t  k). 
j  J  2 

Trivial  sequence  of  decisions:  A  sequence  of  decisions  k^,  ^ . km 

is  said  to  be  trivial  with  respect  to  state  i  if 

Pi  i  =  1  r  =  1,  2 . m 

r  r-fl 
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and 


ri  i  <*IV  ' 0 

r  r+1 


t  <  0 

t  >  0 


for  r  =  1,  2,  m,  where  i1  =  i =  i  and  ir  ^  i  for  r  =  2,  3, 


m< 


Condition  1:  There  exists  an  integer  nQ  and  a  fraction  q  with 

0  <  q  <  1  such  that  for  all  n  >_  nQ  with  decisions  k 2,  kn 

in  states  i- ,  ios  . i  and  transition  time  distribution  F  .  (t|k-), 
1  1  n  ^1  2 


...  F.  .  t  k  )  there  exists  an  e  >  0  and  a  6  >  0  such  that  for 
’  ii,-  1  n 
n  n+1 


1*  J2 * 


*  •  9 


n 


i. 

J 


(6|k  )  < 

+1  J1 


1-e 


where  3±,  j2> 


is  a  subsequence  of 


1,  2,  n  and  n*  >_  qn. 


M/M/1  Queue;  A  single  server  queue  where  customers  arrive  according  to 

a  Poisson  process  with  parameter  X.  Service  times  are  independently  and 

,  ,  _  -yt 

identically  distributed  with  cumulative  distribution  function  1  -  e 


M/G/l  Queue:  A  single  server  queue  where  customers  arrive  according  to 
a  Poisson  process  with  parameter  X.  Service  times  are  independently  and 
identically  distributed  with  cumulative  distribution  function  B. 
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List  of  Symbols: 


-1 

y 


y2 


p=Ay 


n 

7T 

& 

f 


Mean  of  service  time  distribution 

Second  moment  about  the  origin  of  service 
time  distribution 

Queue  utilization  factor 

Class  of  admissible  policies 

Element  of  II 

Class  of  Stationary  Policies 
Element  of 


B.  List  of  Symbols  and  Key  Theorems  for  Discounted  Cost  Criterion 


List  of  Symbols: 


6 

Discount  factor 

Ce"et 

Equivalent  cost  at  time  0  of  a  cost  C  incurred 
at  time  t 

c.(k) 

Expected  discounted  cost  of  a  transition  when 
action  k  is  taken  in  state  i 

V*> 

Total  expected  discounted  cost  of  using  tt 
given  the  process  begins  at  time  0  in  state  i 

ve(i) 

Optimal  discounted  cost  function  where 

* 

7T 

V  (i)  =  inf  V  (i) 

*  iren  77 

ir  is  g-optimal  if  V  *(i)  =  V.(i)  for  all  i 

TT  P 

f* 

f*  is  stationary  B-optimal  if  f*e^  and 

Vf*(i)  =  Vg(i)  for  all  i 

Key  Theorems: 

THEOREM  3.10. 

(Functional  Equation  of  Optimality) 

V  (i)  =  min 
$  k 

OO  00 

(C  (k)  +  l  P  (k)  /  e"6tv  (j)dF  (t|k)}  (3.5) 

j=0  3  0  15  1J 
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(If  Vf(l)  satisfies  (3.5),  then  f  is  called  a  B-optimal  improvement 
policy.) 

THEOREM  3.12, 

If  (^(k)  >  0  for  all  i  and  k,  then  there  exists  a  stationary 
B-optimal  policy. 


THEOREM  3.17. 

If  the  condition  of  Theorem  3.12  holds,  V^*(i)  satisfies  (3.5)  and 
<  Q(i)  where  Q  is  a  polynomial  of  finite  degree  r,  increasing 

in  i  with 


x  =  0,  1,  ... ; 
k  w  1,  2,  Kq 


then  f*  is  a  B-optimal  policy. 
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C.  List  of  Symbols  and  Key  Theorems  for  the  Average  Cost  Case 


List  of  Symbols: 


CL(k)  Expected  cost  of  transition  if  action  k  is 

taken  in  state  i 

t±(k)  Expected  time  of  transition  if  action  k  is 

taken  in  state  i 


* 

* 


7T 


Expected  average  cost  per  unit  time  if  policy  7r 
is  used 

inf  d) 
tteII  71 

* 

7T  is  optimal  if  =  <J> 

f*  is  stationary  optimal  if  f*e^  and  <j>^*  =  <f> 

First  passage  time  into  state  0  given  the  process 
begins  in  state  i 


ciQ  Cost  associated  with  first  passage  to  state  0 

given  the  process  begins  in  state  i 

V^Ct,©)  Expected  relative  cost  with  respect  to  0  of 

first  passage  to  state  0  given  the  process 
begins  in  state  i  and  policy  7T  is  used,  i.e., 


V(i, 0) 


v„a,e)  -  E^C  -  8T  ] 


Optimal  relative  cost  function  where 


V(i,6)  =  inf  V  (i,6) 
tteh  t 


Key  Theorems : 

THEOREM  4.1.  (Functional  Equation  of  Optimality) 

For  i  =  0 ,  1,  2,  ..., 

00 

V(i.+)  =  min  (C± (k)  -  <J»t±(k)  +  J  P  (k)V(j  ,<j>) }  (4.3) 

k  j=0  13 

or  in  vector  notation 

97 


NWC  TP  5594 


V(<j>)  =  min  {C(f)  -  *t(f)  +  P(f)V(*)} 

where  <t>  is  determined  from  V(0,4>)  =  0.  (If  satisfies  (4.3) 

with  (0 , <(>^)  =  0,  then  f  is  called  a  0-optimal  improvement  policy.) 

THEOREM  4.3. 

If  the  C^(k)  are  bounded  below,  4^  M  <  00  for  some  ir,  and 
S  =  (i:  Ci(k)  -  Mti(k)  <  0  for  some  k}  is  finite,  then  V(<(0  exists 
and  is  bounded  below. 

Assumptions  common  to  remaining  theorems: 

A(l) :  n  =  {tt:  $  5  M}  is  non-empty  where  0  £  M  <  00 . 

A(2) :  There  exists  a  state,  say  state  0,  that  is  positive 
recurrent  over  all  ttgII  . 

A(3):  V(<|>)  exists  and  is  bounded  below. 

THEOREM  4.4. 

If  A(l),  A(2),  and  A(3)  hold,  then  a  stationary  optimal  policy  f* 
exists,  and  it  satisfies  the  relationship 

V,*  =  min  [C(f)  -  <j)  *t(f)  +  P(f)V  *]  <4-4) 

t  f 

where  is  determined  from  V^*(0,4>^*)  =  0. 


THEOREM  4.8. 

If  assumptions  A(l) ,  A(2) ,  A(3)  hold  and  f  £&  is  such  that 


and  <j>f*  satisfy  (4.4)  with  x(f)Vf* 

98 
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is  the  stationary  distribution  of  the  imbedded  Markov  chain  associated  with 
f,  then  f  is  optimal  over  II. 

COROLLARY  4.9. 

If  assumptions  A(l),  A(2),  A(3)  hold  and  f*e^F  is  such  that 
Vf*  and  <t>f*  satisfy  (4.4)  with  Vf*  <_  M*,  then  f*  is  optimal 
over  II. 
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